본문 바로가기
  • Home

A Comparative Analysis of BERTopic and LDA for Topic Modeling of Korean Sleep Health Discourse on Social Media

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2026, 31(2), pp.231~239
  • DOI : 10.9708/jksci.2026.31.02.231
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : December 30, 2025
  • Accepted : January 27, 2026
  • Published : February 27, 2026

JongHwi Song 1

1연세대학교

Accredited

ABSTRACT

This study compared the performance of BERTopic and Latent Dirichlet Allocation (LDA) for topic modeling of Korean sleep health-related social media text. A total of 8,002 blog posts were collected from Naver using nine sleep-related keywords between March and October 2025. Both methods were applied to the same dataset, and their performance was evaluated using metrics including the number of topics, noise ratio, distribution entropy, and topic coherence. The results indicated that BERTopic identified 9 topics with a noise ratio of 22.8%, whereas LDA yielded 6 effective topics with a significantly lower noise ratio of 0.9%. BERTopic demonstrated higher distribution uniformity (0.852) compared to LDA (0.804), indicating more balanced topic assignments. LDA achieved a coherence score (C_V) of 0.5287. The cross-tabulation analysis revealed that BERTopic's "Melatonin/Hormone" topic showed 84.1% concentration in LDA's "Insomnia General" topic, demonstrating high consistency for well-defined topics. This study provides methodological insights for researchers selecting topic modeling approaches for Korean health-related text analysis.

Citation status

* References for papers published after 2024 are currently being built.