본문 바로가기
  • Home

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval

  • Journal of Korean Library and Information Science Society
  • Abbr : JKLISS
  • 2021, 52(1), pp.79-108
  • DOI : 10.16981/kliss.52.1.202103.79
  • Publisher : Korean Library And Information Science Society
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : February 18, 2021
  • Accepted : March 20, 2021
  • Published : March 30, 2021

Sukjin You 1

1University of Wisconsin-Milwaukee

Accredited

ABSTRACT

Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.

Citation status

* References for papers published after 2023 are currently being built.