본문 바로가기
  • Home

An Automatic Schema Generation System based on the Contents for Integrating Web Information Sources

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2008, 13(6), pp.77-86
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science

곽준영 1 BAEJONGMIN 2

1(주)위너스텍
2경상대학교

Accredited

ABSTRACT

The Web information sources can be regarded as the largest distributed database to the users. By virtually integrating the distributed information sources and regarding them as a single huge database, we can query the database to extract information. This capability is important to develop Web application programs. We have to infer a database schema from browsing-oriented Web documents in order to integrate databases. This paper presents a heuristic algorithm to infer the XML Schema fully automatically from semi-structured Web documents. The algorithm first extracts candidate pattern regions based on predefined structure-making tags, and determines a target pattern region using a few heuristic factors, and then derives XML Schema extraction rules from the target pattern region. The schema extraction rule is represented in XQuery, which makes development of various application systems possible using open standard XML tools. We also present the experimental results for several public web sources to show the effectiveness of the algorithm.

Citation status

* References for papers published after 2023 are currently being built.