Maulana, Muhammad Khalid and Saputro, Setyo Wahyu and Faisal, Mohammad Reza and Nugroho, Radityo Adi and Ramadhan, As’ary (2026) Enhancing Software Defect Prediction through Hybrid Multi-Filter Feature Selection and Imbalance Handling. Journal of Computing Theories and Applications, 3 (4). pp. 518-534. ISSN 3024-9104
15943-Similarity Check-56094-1-10-20260424.pdf - Published Version
Available under License Creative Commons Attribution.
Download (650kB)
Abstract
Software Defect Prediction (SDP) aims to identify defective modules early in the software development lifecycle to improve software quality and reduce maintenance costs. However, SDP datasets commonly suffer from high dimensionality, feature redundancy, and class imbalance, which can degrade model performance and stability. This study proposes a hybrid feature selection framework to address these challenges and enhance prediction performance. The proposed approach integrates Combined Correlation and Mutual Information (CONMI), which combines the Pearson Correlation Coefficient (PCC) and Mutual Information (MI) to capture both linear and nonlinear feature relevance. The selected features are further refined through Top-K selection, correlation-based filtering to reduce multicollinearity, and Backward Elimination (BE) to obtain an optimal feature subset. To address class imbalance, SMOTE-Tomek is applied by combining over-sampling and data cleaning techniques. Experiments are conducted on twelve NASA MDP datasets using Logistic Regression (LR) and Naïve Bayes (NB) classifiers. The results show that the proposed framework consistently achieves the best performance, with Logistic Regression combined with SMOTE-Tomek obtaining the highest average AUC of 0.7923 ± 0.0714, while NB achieves 0.7554 ± 0.0580. Statistical analysis using a paired t-test indicates that the proposed method significantly outperforms MI+SMOTE-Tomek and BE+SMOTE-Tomek for Logistic Regression, whereas no significant differences are observed for NB. In addition to improving overall classification performance (AUC), the proposed approach also enhances minority class detection, as reflected in improved Recall and F1-score. Overall, the proposed hybrid framework provides an effective and reliable solution for software defect prediction, particularly for high-dimensional and imbalanced datasets.
| Item Type: | Article |
|---|---|
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
| Depositing User: | dl fts |
| Date Deposited: | 24 Apr 2026 16:48 |
| Last Modified: | 24 Apr 2026 16:48 |
| URI: | https://dl.futuretechsci.org/id/eprint/178 |
