An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

Han, Yoo-Jin (한유진); Seung-Woo Oh (오승우)

doi:10.3743/KOSIM.2010.27.2.007

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

Journal of the Korean Society for Information Management
Abbr : JKOSIM
2010, 27(2), pp.7~20
DOI : 10.3743/KOSIM.2010.27.2.007
Publisher : 한국정보관리학회
Research Area : Interdisciplinary Studies > Library and Information Science
Received : April 16, 2010
Accepted : June 13, 2010
Published : June 30, 2010

Han, Yoo-Jin ¹, Seung-Woo Oh ²

¹숙명여자대학교
²Seoul National University

Accredited

ABSTRACT

This study aims to provide a method of extracting the most recent information on US patent documents. An HTML paring technique that can directly connect to the US Patent and Trademark Office (USPTO) Web page is adopted. After obtaining a list of 50 documents through a keyword searching method, this study suggested an algorithm, using HTML parsing techniques, which can extract a patent number, an applicant, and the US patent class information. The study also revealed an algorithm by which we can extract both patents and subsequent patents using their closely connected relationship, that is a very distinctive characteristic of US patent documents. Although the proposed method has several limitations, it can supplement existing databases effectively in terms of timeliness and comprehensiveness.

KEYWORDS

US patents, bibliographic information, extraction, HTML parsing

Citation status

* References for papers published after 2025 are currently being built.

[journal] Calcagno, M. / 2008 / An investigation into analyzing patents by chemical structure using Thomson’s Derwent World Patent Index codes / World Patent Information 30(3) : 188~198

[journal] Ernst, H. / 2003 / Patent Information for Strategic Technology Management / World Patent Information 25(3) : 233~242

[journal] Gupta, S. / 2005 / Automating Content Extraction of HTML Documents / World Wide Web 8(2) : 179~224

[report] Hall, B. / 2001 / The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools

[book] Lerdorf, R. / 2006 / Programming PHP (2nd ed.) / O'Reilly Media

[journal] Lichtenthaler, U. / 2009 / The role of corporate technology strategy and patent portfolios in low-, medium- and high-technology firms / Research Policy 38(3) : 559~569

[journal] No, H. J. / 2010 / Trajectory patterns of technology fusion: Trend analysis and taxonomical grouping in nanobiotechnology / Technological Forecasting and Social Change 77(1) : 63~75

[journal] Simmons, E. S. / 2004 / The online divide: a professional user’s perspective on Derwent database development in the online era / World Patent Information 26(1) : 45~47

[other] / World Intellectual Property Organization (WIPO, 2010) IP Statistics

[journal] 유재복 / 2010 / Analysis of Factors Influencing Patent Citations / 정보관리학회지 / 한국정보관리학회 27(1) : 103~118

[journal] Yoon, B. U. / 2004 / A text-mining-based patent network: Analytical tool for high-technology trend / The Journal of High Technology Management Research 15(1) : 37~50

KJCKorea
Journal Central

Journal of the Korean Society for Information Management 2025 KCI Impact Factor : 1.27

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of the Korean Society for Information Management 2025 KCI Impact Factor : 1.27

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (11) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.