본문 바로가기
  • Home

A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning

  • Journal of the Korean Society for Information Management
  • Abbr : JKOSIM
  • 2018, 35(2), pp.63~88
  • DOI : 10.3743/KOSIM.2018.35.2.063
  • Publisher : 한국정보관리학회
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : May 20, 2018
  • Accepted : June 19, 2018
  • Published : June 30, 2018

JeeHee Yuk 1 Min Song 2

1연세대학교 일반대학원 문헌정보학과
2연세대학교

Accredited

ABSTRACT

This research evaluated differences of classification performance for feature selection methods using LDA topic model and Doc2Vec which is based on word embedding using deep learning, feature corpus sizes and classification algorithms. In addition to find the feature corpus with high performance of classification, an experiment was conducted using feature corpus was composed differently according to the location of the document and by adjusting the size of the feature corpus. Conclusionally, in the experiments using deep learning evaluate training frequency and specifically considered information for context inference. This study constructed biomedical document dataset, Disease-35083 which consisted biomedical scholarly documents provided by PMC and categorized by the disease category. Throughout the study this research verifies which type and size of feature corpus produces the highest performance and, also suggests some feature corpus which carry an extensibility to specific feature by displaying efficiency during the training time. Additionally, this research compares the differences between deep learning and existing method and suggests an appropriate method by classification environment.

Citation status

* References for papers published after 2023 are currently being built.