A Robust Model for Phishing URL Classification and Intrusion Detection using Machine Learning Techniques

Muhammad Imran  Ghafoor; Paras  Pervaiz; Shumaila  Hussain; Saima  Tareen; Mehmood  Baryalai; Shariqa  Fakhar; Shah  Noor

doi:10.62019/w701fe48

Authors

Muhammad Imran Ghafoor Department of Information Security,PUCIT Punjab University, Lahore.
Paras Pervaiz Balochistan University of Information Technology, Engineering and Management Sciences ( BUITEMS), Quetta Pakistan.
Shumaila Hussain Department of Computer Science Sardar Bahadur Khan Women’s University, Quetta Pakistan.
Saima Tareen Computer Science Department, Balochistan University of Information Technology Engineering, and Management Sciences (BUITEMS), Quetta, Pakistan
Mehmood Baryalai Department of IT, Balochistan University of Information Technology, Engineering and Management Sciences ( BUITEMS), Quetta Pakistan.
Shariqa Fakhar Computer Science Department, Sardar Bahadur Khan Women’s University, Quetta, Pakistan.
Shah Noor Computer Science Department, Balochistan University of Information Technology Engineering, and Management Sciences (BUITEMS), Quetta, Pakistan.

DOI:

https://doi.org/10.62019/w701fe48

Abstract

Phishing is one of the most prevalent and risky online threats. It works when hackers deceive internet users into providing personal information, such as passwords, login credentials, and credit card numbers, in order to obtain data that is frequently used against them. Victims are often sent phishing URLs (Uniform Resource Locators) via email. These URLs send users to fraudulent websites, phishing, spam, drive-by download attacks, and other hazardous websites. It's critical to accurately classify each URL as harmful or legitimate in order to prevent consumers from accessing malicious URLs. Phishing URL categorization helps in avoiding visits to harmful websites beforehand. To recognize intrusion attacks and classify phishing URLs, we provide a deep neural network-based method. Three sources of information were used: Kaggle, PhishTank, and Alexa. Term Frequency Inverse Document Frequency (TF-IDF) properties of a Support Vector Machine (SVM) are used to classify the phishing URLs in the first place. Second, we detect intrusions using a deep neural network. Finally, we evaluate our proposed model against previous approaches. Our research indicates that the SVM algorithm using TF-IDF produces an accuracy rate of 97.14% and a false positive rate of 2.8%. The model's intrusion detection predictions using validation data yielded promising results. We achieved an F1 score of 5.873%. With the exception of NMAP and a few other assaults, we obtained an accuracy rate greater than 95%. The main contributions of this study are: 1) improving phishing URL classification by combining SVM and TF-IDF, 2) utilizing a DNN model for efficient intrusion detection, and 3) conducting a thorough evaluation across multiple datasets to illustrate the reliability and robustness of the proposed method. The findings of the experiment indicate that the suggested model considerably enhances cybersecurity defensive systems, outperforming existing strategies in terms of accuracy, false positive rate, and detection precision.