The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean (한국어 단어 이해에 관한 비지도 학습 기반 언어 모델링)

Euhee Kim (김유희)

doi:10.9708/jksci.2019.24.11.041

The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2019, 24(11), pp.41~49
DOI : 10.9708/jksci.2019.24.11.041
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : October 7, 2019
Accepted : November 7, 2019
Published : November 29, 2019

Euhee Kim ¹

¹신한대학교

Accredited

ABSTRACT

We are to build an unsupervised machine learning-based language model which can estimate the amount of information that are in need to process words consisting of subword-level morphemes and syllables. We are then to investigate whether the reading times of words reflecting their morphemic and syllabic structures are predicted by an information-theoretic measure such as surprisal. Specifically, the proposed Morfessor-based unsupervised machine learning model is first to be trained on the large dataset of sentences on Sejong Corpus and is then to be applied to estimate the information-theoretic measure on each word in the test data of Korean words. The reading times of the words in the test data are to be recruited from Korean Lexicon Project (KLP) Database. A comparison between the information-theoretic measures of the words in point and the corresponding reading times by using a linear mixed effect model reveals a reliable correlation between surprisal and reading time. We conclude that surprisal is positively related to the processing effort (i.e. reading time), confirming the surprisal hypothesis.

KEYWORDS

Unsupervised learning, Morfessor, Surprisal, Lexical processing, Word recognition

Citation status

* References for papers published after 2024 are currently being built.

[confproc] A. J / 2015 / What Your Username Says About You / Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing : 2302~2307

[thesis] M. Creutz / 2006 / Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor1.0 / Helsinki University of Technology

[journal] S. Virpioja / 2018 / Using Statististical Models of Morphology in the Search for Optimal Units of Representation in the Human Mental Lexicon / Cognitive Science 42 : 939~973

[journal] M. Lehtonen / 2019 / Statistical models of morphology predict eye-tracking measures during visual word recognition / Memory&Cognition 47(7) : 1245~1269

[book] G. Booij / 2012 / The Grammar of Words: An Introduction to Linguistic Morphology. Oxford Textbooks in Linguistics / OUP Oxford

[web] / Sejong-Corpus / http://ithub.korean.go.kr/user/main.do

[web] / Kkma / http://kkma.snu.ac.kr/documents/?doc=postag

[web] / UTagger / http://nlplab.ulsan.ac.kr/doku.php?id=utagger

[web] / Khaiii / https://tech.kakao.com/2018/12/13/khaiii/

[journal] S. Virpioja / 2013 / Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline / Aalto University publication series SCIENCE + TECHNOLOGY 25 : 38~

[journal] A. Viterbi / 1967 / Error bounds for convolutional codes and an asymptotically optimum decoding algorithm / IEEE Transactions on Information Theory 13(2) : 260~269

[web] / Korean Lexicon Project / http://klexicon.org

[journal] 이광오 / 2017 / The Korean Lexicon Project: A Lexical Decision Study on 30,930 Korean Words and Nonwords / 한국심리학회지: 인지 및 생물 / 한국인지및생물심리학회 29(4) : 395~410

[journal] 김유희 / 2018 / A Deeping Learning-based Article- and Paragraph-level Classification / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 23(11) : 31~41

[journal] 박지현 / 2019 / An Analysis of Instagram Hashtags Related to the Exhibitions in Korea / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 24(3) : 49~56

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (5)

REFERENCES (15) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.