A quantitative study on lexicon and speech acts characteristics in dialogue corpus for the application of artificial intelligence learning corpus (인공지능 학습용 말뭉치의 어휘‧화용적 특징에 대한 계량적 연구)

Jo Kyungsun (조경순); KANG EUNJIN (강은진)

doi:10.19033/sks.2020.12.70.221

A quantitative study on lexicon and speech acts characteristics in dialogue corpus for the application of artificial intelligence learning corpus

Korean Semantics
2020, 70(), pp.221~245
DOI : 10.19033/sks.2020.12.70.221
Publisher : The Society Of Korean Semantics
Research Area : Humanities > Korean Language and Literature
Received : October 29, 2020
Accepted : December 17, 2020
Published : December 30, 2020

Jo Kyungsun ¹, KANG EUNJIN ¹

¹전남대학교

Accredited

ABSTRACT

In this paper, the lexical characteristics and speech acts characteristics appearing in interactive corpus built for artificial intelligence learning were analyzed. Corpus was classified by the situation of search and reservation. As lexicon characteristics, the degree of lexicon density and lexicon diversity was investigated, and as speech act characteristics, the frequency of direct and indirect speech act was analyzed. As a result of the analysis, First, the hypothesis of lexicon density that search and reservation corpus is related to content words and function words was accepted without being rejected according to the results of the Chi test. Second, we calculated TTR and GI to understand lexicon diversity, and the GI value of the search situation was higher than the reservation situation, indicating that more diverse vocabulary was used in the search situation. Third, search and reservation corpus had significant differences in frequency of direct and indirect speech. The study can reveal the characteristics of language expressions that humans use to communicate with artificial intelligence. In addition, the results of this study could contribute to the composition of the principles and guidelines for building an efficient and balanced corpus for artificial intelligence learning.

KEYWORDS

artificial intelligence learning corpus, interactive corpus, lexical characteristics, speech acts characteristics, lexicon density, lexicon diversity, direct speech acts, indirect speech acts, type-token ratio, Guiraud index, Chi test

Citation status

* References for papers published after 2024 are currently being built.

[book] 강범모 / 2011 / 언어, 컴퓨터, 코퍼스언어학 (개정판) / 고려대학교 출판부

[report] 국립국어연구원 / 2002 / 기본 어휘 선정 및 사용 실태 조사를 위한 기초 연구 / 국립국어연구원

[report] 국립국어연구원 / 2003 / 한국어 학습용 어휘 선정 결과 보고서 / 국립국어연구원

[report] 국립국어원 / 2005 / 현대국어 사용 빈도 조사 2 / 국립국어원

[report] 국립국어원 / 2009 / 교육용 기본 어휘 선정을 위한 기초 연구 / 국립국어원

[report] 국립국어원 / 2010 / 초등학생 글쓰기 어휘 조사 연구 / 국립국어원

[book] 권혁승 / 2012 / 코퍼스 언어학 / 한국문화사

[book] 김광해 / 2003 / 등급별 국어교육용 어휘 / 박이정

[journal] 김성진 / 2014 / Analysis of Korean Language Parsing System and Speed Improvement of Machine Learning using Feature Module / 전자공학회논문지 / 대한전자공학회 51(8) : 66~74

[journal] 김유정 / 2005 / Standards for Error Analysis of Corpus by Learners of Korean as a Second Language / 한국어교육 / 국제한국어교육학회 16(1) : 45~76

[thesis] 김하영 / 2016 / 대학수학능력시험 영어 영역 듣기 평가 대화 지문의 화행 분석 - Searle(1976)의 화행 이론을 근거로 / 박사 / 인하대학교

[journal] 김한샘 / 2017 / Factors and Practice of Korean Learner Corpus Annotation / 배달말 / 배달말학회 (61) : 149~173

[journal] 박준혁 / 2019 / A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus / 정보처리학회 논문지 / 한국정보처리학회 8(5) : 213~222

[journal] 신명선 / 2017 / The necessity and significance of constructing a corpus containing students’ literacy development / 우리말교육현장연구 / 우리말교육현장학회 11(1) : 7~41

[confproc] 양단희 / 1997 / 한국어 기계학습과 말뭉치 구축 / 한국정보과학회 학술발표논문집 / 한국정보과학회 25 : 408~410

[journal] 양단희 / 2000 / 기계학습에 필요한 풍부한 말뭉치의 구축 방안 / 論文集 / 三陟大學校 33 : 385~391

[book] 윤평현 / 2020 / 새로 펴낸 국어의미론 / 역락

[journal] 이삼형 / 2017 / Basic research on Selection of Korean Fundamental Vocabulary - Focused on the Current Situation and Problems / 국어교육 / 한국어교육학회 (156) : 61~91

[journal] 이소영 / 2017 / The distribution of English vocabulary in Middle School English III textbooks / 중등영어교육 / 한국중등영어교육학회 10(4) : 121~142

[journal] 이슬기 / 2019 / A study on the usage patterns of content and function words in academic essays written by middle and high school students / 학습자중심교과교육연구 / 학습자중심교과교육학회 19(8) : 519~541

[journal] 이현우 / 2018 / Vocabulary Analysis of the English Section of the 2017 CSAT and CSAT Simulations / 국제언어문학 / 국제언어문학회 (41) : 195~216

[book] 장경희 / 2012 / 초·중·고등학생의 구어 어휘 조사 / 지식과 교양

[journal] 장현진 / 2014 / Study on Selection of Basic Vocabulary for Elementary School Students: Focused on Basic Vocabulary in the Lower Grades / 언어치료연구 / 한국언어치료학회 23(1) : 157~170

[journal] 최석재 / 2013 / The Study of Combined Postposition's Construction / 돈암어문학 / 돈암어문학회 26 : 303~334

[journal] 홍정하 / 2008 / Sejong Korean Treebank : Methods of Construction and Distribution of Syntactic Categories and Functions / 민족문화연구 / 민족문화연구원 (49) : 285~331

[book] Bach, K. / 1979 / Linguistic communication and speech acts / MIT press

[book] Eddington, D. / 2017 / 언어학자를 위한 통계학 / 한국외국어대학교 지식출판원

[journal] Drum, P. A. / 1981 / The effects of sentence structure variables on performance in reading comprehension tests / Reading Research Quarterly 16 : 186~514

[book] Guiraud, P. / 1954 / Les caractères statistiques du vocabulaire: Essai de méthodologie / Presses Universitaires de France

[book] Hudson, T. / 2007 / Teaching second language reading / Oxford University Press

[book] Levinson, S. / 1983 / Pragmatics / Cambridge University Press

[book] Lyons, J. / 1977 / 의미론 / 한국문화사

[book] Read, J. / 2000 / Assessing vocabulary / Cambridge University Press

[book] Searle, J. R. / 1969 / Speech Acts / Cambridge University

[book] Searle, J. R. / 1975 / Syntax and semantics Volume 3: Speech acts : 59~82

KJCKorea
Journal Central

Korean Semantics 2024 KCI Impact Factor : 0.96

A quantitative study on lexicon and speech acts characteristics in dialogue corpus for the application of artificial intelligence learning corpus

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Korean Semantics 2024 KCI Impact Factor : 0.96

A quantitative study on lexicon and speech acts characteristics in dialogue corpus for the application of artificial intelligence learning corpus

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (4)

REFERENCES (35) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.