본문 바로가기
  • Home

Text Extraction and Summarization from Web News

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2007, 12(5), pp.1-10
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science

Han Kwang Rok 1 Bokkeun Sun 1 Hyoungsun Yoo 2

1호서대학교
2순천향대학교

Accredited

ABSTRACT

Many types of information provided through the web including news contents contain unnecessary clutters. These clutters make it difficult to build automated information processing systems such as the summarization, extraction and retrieval of documents. We propose a system that extracts and summarizes news contents from the web. The extraction system receives news contents in HTML as input and builds an element tree similar to DOM tree, and extracts texts while removing clutters with the hyperlink attribute in the HTML tag from the element tree. Texts extracted through the extraction system are transferred to the summarization system, which extracts key sentences from the texts. We implement the summarization system using co-occurrence relation graph. The summarized sentences of this paper are expected to be transmissible to PDA or cellular phone by message services such as SMS.

Citation status

* References for papers published after 2022 are currently being built.