Babalola, Olusola and Ojokoh, Bolanle and Boyinbode, Olutayo (2024) Comprehensive Evaluation of LDA, NMF, and BERTopic's Performance on News Headline Topic Modeling. Journal of Computing Theories and Applications, 2 (2). pp. 268-289. ISSN 3024-9104
11635-Article Text-41153-1-10-20241123.pdf - Published Version
Download (601kB) | Preview
Abstract
Topic modeling is an integral text mining component, employing diverse algorithms to uncover hidden themes within texts. This study examines the comparative performance of prominent topic modeling techniques on news headlines, which is characterized by brevity and specific linguistic style. Given the corpus originates from a non-native English-speaking country, an additional layer of complexity is introduced to the task. Our research explores the feasibility of employing a committee approach for topic modeling, evaluating the efficacy and challenges of various methods in practical settings. We applied three techniques—Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and BERTopic—to create models with a fixed number of topics (n=40). These models were then tested on approximately 150,000 news headlines. To assess topic coherence, we utilized Word2Vec, human evaluators, and two large language models. Statistical tests confirmed the significance and impact of our findings. BERTopic demonstrated superior coherence compared to NMF, though slightly, but consistently outperformed NMF and LDA according to human and LLM evaluations. The notable disparity in LDA's performance relative to BERTopic and NMF underscores the importance of carefully selecting a topic modeling technique, as the choice can significantly influence the outcome of the analysis.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Depositing User: | dl fts |
Date Deposited: | 23 Nov 2024 16:46 |
Last Modified: | 25 Nov 2024 09:16 |
URI: | https://dl.futuretechsci.org/id/eprint/19 |