Okpor, Margaret Dumebi and Aghware, Fidelis Obukohwo and Akazue, Maureen Ifeanyi and Eboka, Andrew Okonji and Ako, Rita Erhovwo and Ojugo, Arnold Adimabua and Odiakaose, Christopher Chukwufunaya and Binitie, Amaka Patience and Geteloma, Victor Ochuko and Ejeh, Patrick Ogholuwarami (2024) Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble. Journal of Future Artificial Intelligence and Technologies, 1 (2). pp. 109-123. ISSN 3048-3719
10.62411.faith.2024-14.pdf - Published Version
Download (440kB) | Preview
Abstract
The digital revolution frontiers have rippled across society today – with various web content shared online for users as they seek to promote monetization and asset exchange, with clients constantly seeking improved alternatives at lowered costs to meet their value demands. From item upgrades to their replacement, businesses are poised with retention strategies to help curb the challenge of customer attrition. The birth of smartphones has proliferated feats such as mobility, ease of accessibility, and portability – which, in turn, have continued to ease their rise in adoption, exposing user device vulnerability as they are quite susceptible to phishing. With users classified as more susceptible than others due to online presence and personality traits, studies have sought to reveal lures/cues as exploited by adversaries to enhance phishing success and classify web content as genuine and malicious. Our study explores the tree-based Random Forest to effectively identify phishing cues via sentiment analysis on phishing website datasets as scrapped from user accounts on social network sites. The dataset is scrapped via Python Google Scrapper and divided into train/test subsets to effectively classify contents as genuine or malicious with data balancing and feature selection techniques. With Random Forest as the machine learning of choice, the result shows the ensemble yields a prediction accuracy of 97 percent with an F1-score of 98.19% that effectively correctly classified 2089 instances with 85 incorrectly classified instances for the test-dataset.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Depositing User: | dl fts |
Date Deposited: | 29 Nov 2024 02:24 |
Last Modified: | 29 Nov 2024 02:24 |
URI: | https://dl.futuretechsci.org/id/eprint/60 |