용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구 (A Study on Keyword Extraction From a Single Document Using Term Clustering)

한승희 (Seunghee Han)

doi:10.4275/KSLIS.2010.44.3.155

A Study on Keyword Extraction From a Single Document Using Term Clustering

Journal of the Korean Society for Library and Information Science
2010, 44(3), pp.155~173
DOI : 10.4275/KSLIS.2010.44.3.155
Publisher : 한국문헌정보학회
Research Area : Interdisciplinary Studies > Library and Information Science
Received : July 19, 2010
Accepted : August 11, 2010

Seunghee Han ¹

¹서울여자대학교

Accredited

ABSTRACT

In this study, a new keyword extraction algorithm is applied to a single document with term clustering. A single document is divided by multiple passages, and two ways of calculating similarities between two terms are investigated; the first-order similarity and the second-order distributional similarity. In this experiment, the best cluster performance is achieved with a 50-term passage from the second-order distributional similarity. From the results of first experiment, the second-order distribution similarity was also applied to various keyword extraction methods using statistic information of terms. In the second experiment, (paragraph frequency) and (term frequency by inverse paragraph frequency) were found to improve the overall performance of keyword extraction. Therefore, it showed that the algorithm fulfills the necessary conditions which good keywords should have.

KEYWORDS

Term Clustering, Keyword Extraction, Single Document, Second-order Similarity, Text Mining

Citation status

* References for papers published after 2025 are currently being built.

[journal] 김수연 / 2006 / An Experimental Study on Selecting Association Terms Using Text Mining Techniques / 정보관리학회지 / 한국정보관리학회 23(3) : 147~166

[journal] 서은경 / 1984 / 용어의 자동분류에 관한 연구 / 정보관리학회지 1(1) : 78~99

[book] 유사라 / 1999 / 정보학연구와 분석방법론 / 나남출판

[journal] 이성직 / 2009 / Keyword Extraction from News Corpus using Modified TF-IDF / 한국전자거래학회지 / 한국전자거래학회 14(4) : 59~73

[journal] 이재윤 / 2007 / Improving the Performance of Document Clustering with Distributional Similarities / 정보관리학회지 / 한국정보관리학회 24(4) : 267~283

[journal] 이주호,김학수 / 2009 / 의존관계를 이용한 단일문서의 키워드 추출 / 2009 한국컴퓨터종합학술대회논문집 36(1) : 293~296

[book] 정영미 / 2005 / 정보검색연구 / 구미무역

[book] 정영미 / 1993 / 정보검색론 / 구미무역

[journal] 한승희 / 2004 / Automatic Generation of the Local Level Knowledge Structure of a Single Document Using Clustering Methods / 정보관리학회지 / 한국정보관리학회 21(3) : 251~268

[journal] Al-Khalifa / 2006 / Folksonomies versus automatic keyword extraction: an empirical study / Proceedings of IADIS Web Applications and Research 2 : 132~143

[confproc] Callan, James P. / 1994 / Passage-level evidence on document retrieval / Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval : 302~310

[journal] Dagan, Ido / 1999 / Similarity-based models of cooccurrence probabilities / Machine Learning 34(1-3) : 43~69

[journal] Hulth, A. / 2010 / Automatic Keyword Extraction Using Domain Knowledge / Lecture Notes in Computer Science 2004/2010 : 472~482

[book] Kullback, Solomon / 1968 / Information Theory and Statistics / Dover Books

[confproc] ] Lee, Lillan / 1999 / Measures of distributional similarity / Proceedings of 37th Annual Meeting of the Association for Computational Linguistics : 25~32

[confproc] Leweis, David D. / 1990 / Term clustering of syntactic phrases / Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval : 385~404

[journal] Lin, J. / 1991 / Divergence measures based on the Shannon entropy / IEEE Transactions on Information Theory 37(1) : 145~151

[confproc] Liu, M. / 2007 / Extractive summarization based on event term clustering / Proceedings of the ACL 2007 : 185~188

[journal] Matzuo, Y. / 2004 / Keyword extraction from a single document using word co-occurrence statistical information / International Journal on artificial Intelligence Tool 13(1) : 157~169

[confproc] Pereira, F. / 1993 / Distributional clustering of English words / Proceedings of the 31st Annual Meeting of the ACL : 183~190

[confproc] Plas, L. van der / 2004 / Automatic keyword extraction from spoken text / Proceedings of the 4th International Conference on Language Resources and Evaluation 2004 : 2205~2208

[book] Sneath, P. H. A. / 1973 / Numerical Taxonomy / Freeman

[book] Sparck Jones, K. / 1971 / Automatic Keyword Classification for Information Retrieval / Butterworth&Co

[journal] Sparck Jones, K. / 1972 / Automatic indexing / Journal of Documentation 30(4) : 393~432

[confproc] Strehl, Alexander / 2000 / Impact of similarity measures on web-page clustering / Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search(AAAI 2000) : 58~64

[confproc] Suzuki, Y. / 1998 / Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles / Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval : 373~374

[thesis] Tombros, Anastasios / 2002 / The Effects of Query-based Hierarchical Clustering of Documents for Information Retrieval / 박사 / Cornell University

[journal] Turney, Peter D. / 2000 / Learning algorithm for keyphrase extraction / Information Retrieval 2(4) : 303~336

[thesis] Weeds, J. E. / 2003 / Measures and Applications of Lexical Distributional Similarity / 박사 / University of Sussex

[journal] White, H. D. / 1981 / Author cocitation: a literature measure of intellectual structure / Journal of the American Society for Information Science 32 : 163~171

[confproc] Witten, Ian H. / 1999 / KEA: practical automatic keyphrase extraction / Proceedings of the 4th ACM Conference on Digital Library : 254~255

[journal] Zobel, J. / 1995 / Efficient Retrieval of Partial Documents / Information Processing and Management 31(3) : 36~377

KJCKorea
Journal Central

Journal of the Korean Society for Library and Information Science 2025 KCI Impact Factor : 1.37

A Study on Keyword Extraction From a Single Document Using Term Clustering

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of the Korean Society for Library and Information Science 2025 KCI Impact Factor : 1.37

A Study on Keyword Extraction From a Single Document Using Term Clustering

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (12)

REFERENCES (32) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.