본문 바로가기
  • Home

A Study on Extracting News Contents from News Web Pages

  • Journal of the Korean Society for Information Management
  • Abbr : JKOSIM
  • 2009, 26(1), pp.305~320
  • DOI : 10.3743/KOSIM.2009.26.1.305
  • Publisher : 한국정보관리학회
  • Research Area : Interdisciplinary Studies > Library and Information Science
  • Received : February 17, 2009
  • Accepted : March 2, 2009
  • Published : March 30, 2009

Yong-Gu Lee 1

1피츠버그대학

Accredited

ABSTRACT

The news pages provided through the web contain unnecessary information. This causes low performance and inefficiency of the news processing system. In this study, news content extraction methods, which are based on sentence identification and block-level tags news web pages, was suggested. To obtain optimal performance, combinations of these methods were applied. The results showed good performance when using an extraction method which applied the sentence identification and eliminated hyperlink text from web pages. Moreover, this method showed better results when combined with the extraction method which used block-level. Extraction methods, which used sentence identification, were effective for raising the extraction recall ratio.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.