A Corpus-based Hybrid Model for Morphological Analysis and Part-of-Speech Tagging (형태소 분석 및 품사 부착을 위한 말뭉치 기반 혼합 모형)

Seung-Wook Lee (이승욱); 이도길; Rim, Hae-Chang (임해창)

A Corpus-based Hybrid Model for Morphological Analysis and Part-of-Speech Tagging

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2008, 13(7), pp.11~18
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science

Seung-Wook Lee ¹, 이도길 ², Rim, Hae-Chang ²

¹고려대학교 정보통신대학원
²고려대학교

Accredited

ABSTRACT

Korean morphological analyzer generally generates multiple candidates, and then selects the most likely one among multiple candidates. As the number of candidates increases, the chance that the correctly analyzed candidate is included in the candidate list also grows. This process, however, increases ambiguity and then deteriorates the performance. In this paper, we propose a new rule-based model that produces one best analysis. The analysis rules are automatically extracted from large amount of Part-of-Speech tagged corpus, and the proposed model does not require any manual construction cost of analysis rules, and has shown high success rate of analysis. Futhermore, the proposed model can reduce the ambiguities and computational complexities in the candidate selection phase because the model produces one analysis when it can successfully analyze the given word. By combining the conventional probability-based model, the model can also improve the performance of analysis when it does not produce a successful analysis.

KEYWORDS

형태소 분석(Morphological Analysis), 품사 부착(Part-of-Speech Tagging), 혼합 모형 (Hybrid Model)

Citation status

* References for papers published after 2024 are currently being built.

[thesis] 임희석 / 1993 / 어절의 중의성 유형 분류에 근거한 한국어 형태소 분석기 / 고려대학교

[confproc] Chanod J. P / 1995 / Tagging French- Comparing a Statistical and a Constraint-based Method / Proc. of the 7th conference of the European chapter of the ACL, Doublin : 149~156

[confproc] Hindle D / 1989 / Acquiring Disambiguation Rules from Text / Proc. of 27th Annual Meeting of the ACL : 118~125

[confproc] Brill E / 1992 / A Simple Rule-based Part-of-speech Tagger / Proc. of the 3rd Conf. on Applied NLP. Trento Italy : 153~155

[journal] 박혜준 / 1994 / 말뭉치 품사꼬리달기 시스템 구현 / 한국정보과학회 봄 학술발표논문집 21(1) : 829~832

[journal] 이하규 / 1997 / 어말-어두 공기 정보를 이용한 한국어 어휘 중의성 해소 / 한국정보과학회 정보과학회논문지 24(1) : 82~89

[confproc] 심준혁 / 1999 / 통계와 규칙을 이용한 강인한 품사태거 / 한국어 형태소 분석기 및 품사태거 평가 워크숍 논문집 : 60~75

[thesis] 최원종 / 2007 / 오류 유형별 후처리를 통한 한국어 품사 부착 성능향상 / 고려대학교

[thesis] 임희석 / 1997 / 언어 지식과 통계 정보를 이용한 한국어 품사 태깅 모델 / 고려대학교

[confproc] 박희근 / 2007 / 어절별 중의성 해소를 이용한 품사 태깅의 성능 향상 / 한글 및 한국어 정보처리 학술대회 : 134~139

[thesis] 이도길 / 2005 / 한국어 형태소 분석과 품사 부착을 위한 확률 모형 / 고려대학교

[web] / / http://www.sejong.or.kr/

[other] / 실험을 위해 펜티엄 3.2GHz의 CPU, 16GB 메모리 사양의 컴퓨터를 이용하였다

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81