TY - GEN
T1 - Advancing Systematic Literature Reviews Methodology Through Topic Modeling
AU - Mekaoui, Salma
AU - Chaker, Ilham
AU - Zarghili, Arsalane
AU - Nikolov, Nikola S.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - The benefits of conducting a systematic review (SR) within a research project are well recognised. Nonetheless, nowadays an SR demands significant time and effort as it typically requires manually sifting through a large corpus of papers. In this work, we propose an application that simplifies the SR process for researchers by performing topic modeling. Our approach involves more than just conducting topic modeling; it entails selecting the optimal method tailored to our system’s needs. We explore two widely recognized methods in topic modeling, namely Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), alongside two prominent libraries, Scikit-learn and Gensim, which provide implementations for these methods. The evaluation of these methods and libraries is done with the help of the cosine similarity metric. Our findings indicate that, for our research context, LDA implemented in Scikit-learn stands out as the most effective method for topic modeling. Additionally, our study suggests that employing 10 topics, along with stemming and lemmatization, yields better results. Our findings culminate in the development of an application to facilitate SRs for researchers.
AB - The benefits of conducting a systematic review (SR) within a research project are well recognised. Nonetheless, nowadays an SR demands significant time and effort as it typically requires manually sifting through a large corpus of papers. In this work, we propose an application that simplifies the SR process for researchers by performing topic modeling. Our approach involves more than just conducting topic modeling; it entails selecting the optimal method tailored to our system’s needs. We explore two widely recognized methods in topic modeling, namely Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), alongside two prominent libraries, Scikit-learn and Gensim, which provide implementations for these methods. The evaluation of these methods and libraries is done with the help of the cosine similarity metric. Our findings indicate that, for our research context, LDA implemented in Scikit-learn stands out as the most effective method for topic modeling. Additionally, our study suggests that employing 10 topics, along with stemming and lemmatization, yields better results. Our findings culminate in the development of an application to facilitate SRs for researchers.
KW - LDA
KW - NMF
KW - Systematic review
KW - Topic modeling
UR - https://www.scopus.com/pages/publications/105000727284
U2 - 10.1007/978-3-031-82150-9_16
DO - 10.1007/978-3-031-82150-9_16
M3 - Conference contribution
AN - SCOPUS:105000727284
SN - 9783031821493
T3 - Communications in Computer and Information Science
SP - 200
EP - 213
BT - Intelligent Systems and Pattern Recognition - 4th International Conference, ISPR 2024, Revised Selected Papers
A2 - Bennour, Akram
A2 - Bouridane, Ahmed
A2 - Almaadeed, Somaya
A2 - Bouaziz, Bassem
A2 - Edirisinghe, Eran
PB - Springer Science and Business Media Deutschland GmbH
T2 - 4th International Conference on Intelligent Systems and Pattern Recognition, ISPR 2024
Y2 - 26 June 2024 through 28 June 2024
ER -