Search for collections on FTS Digilib

Android Malware Detection Using Machine Learning with SMOTE-Tomek Data Balancing

Masari, Maryam Sufiyanu and Danladi, Maiauduga Abdullahi and Onyinye, Ilori Loretta and Tohomdet, Loreta Katok (2026) Android Malware Detection Using Machine Learning with SMOTE-Tomek Data Balancing. Journal of Computing Theories and Applications, 3 (3). pp. 302-313. ISSN 3024-9104

[thumbnail of 15084-Article Text-54497-3-10-20260118.pdf]
Preview
Text
15084-Article Text-54497-3-10-20260118.pdf - Published Version
Available under License Creative Commons Attribution.

Download (552kB) | Preview

Abstract

This study presents a comprehensive comparative analysis of four traditional machine learning algorithms Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine for Android malware detection using the preprocessed TUANDROMD dataset comprising 4,465 instances and 241 features representing both static and dynamic application characteristics. Motivated by the limitations of conventional signature-based and hybrid detection methods, especially in managing imbalanced datasets and detecting emerging malware variants, the study employed SMOTE to ensure balanced training data and fair model evaluation. The dataset was divided into 80% training and 20% testing subsets, and models were assessed using key performance metrics including accuracy, precision, recall, F1-score, and ROC AUC. The findings revealed that the proposed Random Forest model outperformed the other classifiers, achieving an accuracy of 0.993, precision of 0.992, recall of 1.000, F1-score of 0.996, and a near-perfect ROC AUC of 0.9998 surpassing state-of-the-art approaches. These results affirm the superior predictive capability, consistency, and robustness of the Random Forest algorithm in Android malware detection. The study concludes that base models, when integrated with class-balancing techniques, provide reliable and efficient malware detection across imbalanced datasets. For future research, the study recommends exploring advanced hybrid or ensemble frameworks that integrate Random Forest with deep learning architectures or other meta-heuristic optimization techniques to further enhance detection accuracy, adaptability, and resilience against rapidly evolving Android malware threats.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: dl fts
Date Deposited: 24 Jan 2026 13:28
Last Modified: 24 Jan 2026 13:28
URI: https://dl.futuretechsci.org/id/eprint/143

Actions (login required)

View Item
View Item