본문 바로가기
  • Home

A quantitative study on lexicon and speech acts characteristics in dialogue corpus for the application of artificial intelligence learning corpus

  • Korean Semantics
  • 2020, 70(), pp.221-245
  • DOI : 10.19033/sks.2020.12.70.221
  • Publisher : The Society Of Korean Semantics
  • Research Area : Humanities > Korean Language and Literature
  • Received : October 29, 2020
  • Accepted : December 17, 2020
  • Published : December 30, 2020

Jo Kyungsun 1 KANG EUNJIN 1

1전남대학교

Accredited

ABSTRACT

In this paper, the lexical characteristics and speech acts characteristics appearing in interactive corpus built for artificial intelligence learning were analyzed. Corpus was classified by the situation of search and reservation. As lexicon characteristics, the degree of lexicon density and lexicon diversity was investigated, and as speech act characteristics, the frequency of direct and indirect speech act was analyzed. As a result of the analysis, First, the hypothesis of lexicon density that search and reservation corpus is related to content words and function words was accepted without being rejected according to the results of the Chi test. Second, we calculated TTR and GI to understand lexicon diversity, and the GI value of the search situation was higher than the reservation situation, indicating that more diverse vocabulary was used in the search situation. Third, search and reservation corpus had significant differences in frequency of direct and indirect speech. The study can reveal the characteristics of language expressions that humans use to communicate with artificial intelligence. In addition, the results of this study could contribute to the composition of the principles and guidelines for building an efficient and balanced corpus for artificial intelligence learning.

Citation status

* References for papers published after 2023 are currently being built.