@article{ART003053203},
author={LeeYong-Gu and Kim SeonWook},
title={A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS},
journal={Journal of the Korean Society for Library and Information Science},
issn={1225-598X},
year={2024},
volume={58},
number={1},
pages={5-30},
doi={10.4275/KSLIS.2024.58.1.005}
TY - JOUR
AU - LeeYong-Gu
AU - Kim SeonWook
TI - A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS
JO - Journal of the Korean Society for Library and Information Science
PY - 2024
VL - 58
IS - 1
PB - 한국문헌정보학회
SP - 5
EP - 30
SN - 1225-598X
AB - The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.
KW - Topic Modeling;LDA;Top2Vec;BERTopic;Library and Information Science
DO - 10.4275/KSLIS.2024.58.1.005
ER -
LeeYong-Gu and Kim SeonWook. (2024). A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS. Journal of the Korean Society for Library and Information Science, 58(1), 5-30.
LeeYong-Gu and Kim SeonWook. 2024, "A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS", Journal of the Korean Society for Library and Information Science, vol.58, no.1 pp.5-30. Available from: doi:10.4275/KSLIS.2024.58.1.005
LeeYong-Gu, Kim SeonWook "A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS" Journal of the Korean Society for Library and Information Science 58.1 pp.5-30 (2024) : 5.
LeeYong-Gu, Kim SeonWook. A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS. 2024; 58(1), 5-30. Available from: doi:10.4275/KSLIS.2024.58.1.005
LeeYong-Gu and Kim SeonWook. "A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS" Journal of the Korean Society for Library and Information Science 58, no.1 (2024) : 5-30.doi: 10.4275/KSLIS.2024.58.1.005
LeeYong-Gu; Kim SeonWook. A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS. Journal of the Korean Society for Library and Information Science, 58(1), 5-30. doi: 10.4275/KSLIS.2024.58.1.005
LeeYong-Gu; Kim SeonWook. A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS. Journal of the Korean Society for Library and Information Science. 2024; 58(1) 5-30. doi: 10.4275/KSLIS.2024.58.1.005
LeeYong-Gu, Kim SeonWook. A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS. 2024; 58(1), 5-30. Available from: doi:10.4275/KSLIS.2024.58.1.005
LeeYong-Gu and Kim SeonWook. "A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS" Journal of the Korean Society for Library and Information Science 58, no.1 (2024) : 5-30.doi: 10.4275/KSLIS.2024.58.1.005