Setiadi, De Rosal Ignatius Moses and Ojugo, Arnold Adimabua and Pribadi, Octara and Kartikadarma, Etika and Setyoko, Bimo Haryo and Widiono, Suyud and Robet, Robet and Aghaunor, Tabitha Chukwudi and Ugbotu, Eferhire Valentine (2025) Integrating Hybrid Statistical and Unsupervised LSTM-Guided Feature Extraction for Breast Cancer Detection. Journal of Computing Theories and Applications, 2 (4). pp. 536-552. ISSN 3024-9104
![12698-Article Text-45247-1-10-20250505.pdf [thumbnail of 12698-Article Text-45247-1-10-20250505.pdf]](https://dl.futuretechsci.org/style/images/fileicons/text.png)
12698-Article Text-45247-1-10-20250505.pdf - Published Version
Available under License Creative Commons Attribution.
Download (616kB)
Abstract
Breast cancer is the most prevalent cancer among women worldwide, requiring early and accurate diagnosis to reduce mortality. This study proposes a hybrid classification pipeline that integrates Hybrid Statistical Feature Selection (HSFS) with unsupervised LSTM-guided feature extraction for breast cancer detection using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Initially, 20 features were selected using HSFS based on Mutual Information, Chi-square, and Pearson Correlation. To address class imbalance, the training set was balanced using the Synthetic Minority Over-sampling Technique (SMOTE). Subsequently, an LSTM encoder extracted non-linear latent features from the selected features. A fusion strategy was applied by concatenating the statistical and latent features, followed by re-selection of the top 30 features. The final classification was performed using a Support Vector Machine (SVM) with RBF kernel and evaluated using 5-fold cross-validation and a held-out test set. Experimental results showed that the proposed method achieved an average training accuracy of 98.13%, F1-score of 98.13%, and AUC-ROC of 99.55%. On the held-out test set, the model reached an accuracy of 99.30%, precision of 100%, and F1-score of 99.05%, with an AUC-ROC of 0.9973. The proposed pipeline demonstrates improved generalization and interpretability compared to existing methods such as LightGBM-PSO, DHH-GRU, and ensemble deep networks. These results highlight the effectiveness of combining statistical selection and LSTM-based latent feature encoding in a balanced classification framework.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Depositing User: | dl fts |
Date Deposited: | 10 May 2025 00:55 |
Last Modified: | 10 May 2025 00:55 |
URI: | https://dl.futuretechsci.org/id/eprint/111 |