An Explainable Multimodal Framework for Chest X-Ray Alert Classification Using Radiology Reports and Images

Winarno, Edy and Nur, Indah Manfaati and Karim, Abdul and Amri, Saeful and Wirdati, Ismi Elya and Adi, Prajanto Wahyu (2026) An Explainable Multimodal Framework for Chest X-Ray Alert Classification Using Radiology Reports and Images. Journal of Computing Theories and Applications, 3 (4). pp. 647-666. ISSN 3024-9104

[thumbnail of 16023-Article Text-57200-1-10-20260523.pdf]

Text
16023-Article Text-57200-1-10-20260523.pdf - Published Version
Available under License Creative Commons Attribution.
Download (910kB)

Official URL: https://doi.org/10.62411/jcta.16023

Abstract

Artificial intelligence has the potential to support radiology workflows by assisting in the identification of cases that may require additional clinical attention. However, alert-oriented medical AI systems should provide not only classification outputs but also interpretable evidence that can be reviewed and audited by clinicians. This study develops and evaluates an explainable multimodal framework for binary chest X-ray alert classification using paired radiology reports and chest X-ray images. The text branch employs TF-IDF n-gram features with a class-balanced Logistic Regression classifier, while the image branch fine-tunes a pretrained ResNet18 model. The two branches are integrated through probability-level late fusion using a validation-selected fusion weight. Explainability is implemented in a modality-specific manner: global coefficient analysis is used to identify influential textual cues, while Grad-CAM heatmaps are used to visualize salient image regions. Experiments were conducted on paired samples from the Open-i/IU X-Ray dataset using text-only, image-only, and fusion-based evaluation settings. Additional analyses include case-level complementarity analysis, bootstrap confidence intervals for ROC-AUC, shortcut-feature inspection, and qualitative Grad-CAM auditing. The results indicate that the text modality provides the dominant predictive signal under the current proxy-label setting. Late fusion produced a small descriptive improvement on the test set, increasing accuracy from 0.8533 to 0.8667, F1-score from 0.8817 to 0.8936, and ROC-AUC from 0.8936 to 0.9025 compared with the text-only baseline. However, the observed ROC-AUC improvement was not statistically conclusive based on bootstrap analysis. These findings suggest that the proposed framework is useful as a reproducible and auditable multimodal prototype, while also highlighting important limitations, including proxy-label ambiguity, potential label leakage from radiology reports, limited image-branch contribution, lack of external validation, and the need for stronger explanation and calibration assessment.

Item Type:	Article
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User:	dl fts
Date Deposited:	29 May 2026 21:57
Last Modified:	29 May 2026 21:57
URI:	https://dl.futuretechsci.org/id/eprint/188

Actions (login required)

: View Item

Search for collections on FTS Digilib