Investigating a SMOTE-Tomek Boosted Stacked Learning Scheme for Phishing Website Detection: A Pilot Study

Ugbotu, Eferhire Valentine and Emordi, Frances Uchechukwu and Ugboh, Emeke and Anazia, Kizito Eluemunor and Odiakaose, Christopher Chukwufunaya and Onoma, Paul Avwerosuoghene and Idama, Rebecca Okeoghene and Ojugo, Arnold Adimabua and Geteloma, Victor Ochuko and Oweimieotu, Amanda Enaodona and Aghaunor, Tabitha Chukwudi and Binitie, Amaka Patience and Odoh, Anne and Onochie, Chris Chukwudi and Ezzeh, Peace Oguguo and Eboka, Andrew Okonji and Agboi, Joy and Ejeh, Patrick Ogholuwarami (2025) Investigating a SMOTE-Tomek Boosted Stacked Learning Scheme for Phishing Website Detection: A Pilot Study. Journal of Computing Theories and Applications, 3 (2). pp. 145-159. ISSN 3024-9104

[thumbnail of 14472-Article Text-51368-1-10-20251001.pdf]

Text
14472-Article Text-51368-1-10-20251001.pdf - Published Version
Available under License Creative Commons Attribution.
Download (455kB)

Official URL: https://doi.org/10.62411/jcta.14472

Abstract

The daily exchange of informatics over the Internet has both eased the widespread proliferation of resources to ease accessibility, availability and interoperability of accompanying devices. In addition, the recent widespread proliferation of smartphones alongside other computing devices has continued to advance features such as miniaturization, portability, data access ease, mobility, and other merits. It has also birthed adversarial attacks targeted at network infrastructures and aimed at exploiting interconnected cum shared resources. These exploits seek to compromise an unsuspecting user device cum unit. Increased susceptibility and success rate of these attacks have been traced to user's personality traits and behaviours, which renders them repeatedly vulnerable to such exploits especially those rippled across spoofed websites as malicious contents. Our study posits a stacked, transfer learning approach that seeks to classify malicious contents as explored by adversaries over a spoofed, phishing websites. Our stacked approach explores 3-base classifiers namely Cultural Genetic Algorithm, Random Forest, and Korhonen Modular Neural Network – whose output is utilized as input for XGBoost meta-learner. A major challenge with learning scheme(s) is the flexibility with the selection of appropriate features for estimation, and the imbalanced nature of the explored dataset for which the target class often lags behind. Our study resolved dataset imbalance challenge using the SMOTE-Tomek mode; while, the selected predictors was resolved using the relief rank feature selection. Results shows that our hybrid yields F1 0.995, Accuracy 0.997, Recall 0.998, Precision 1.000, AUC-ROC 0.997, and Specificity 1.000 – to accurately classify all 2,764 cases of its held-out test dataset. Results affirm that it outperformed bench-mark ensembles. Result shows the proposed model explored UCI Phishing Website dataset, and effectively classified phishing (cues and lures) contents on websites.

Item Type:	Article
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User:	dl fts
Date Deposited:	02 Oct 2025 02:04
Last Modified:	02 Oct 2025 02:04
URI:	https://dl.futuretechsci.org/id/eprint/131

Actions (login required)

: View Item

Search for collections on FTS Digilib