An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method (コーパス言語学の動向と成果 - テキストマイニングの技法を用いて -)

Jang, Kun-Soo (張根壽)

doi:10.14817/jlak.2025.85.51

An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method

The Japanese Language Association of Korea
Abbr : JLAK
2025, (85), pp.51~68
DOI : 10.14817/jlak.2025.85.51
Publisher : The Japanese Language Association Of Korea
Research Area : Humanities > Japanese Language and Literature
Received : June 28, 2025
Accepted : August 22, 2025
Published : September 20, 2025

Jang, Kun-Soo ¹

¹詳明大

Accredited

ABSTRACT

This study uses text mining to identify and describe the trends and outcomes in the field of “Japanese corpus linguistics.” It specifically aims to clarify when corpora were first utilized within Japanese linguistics and Japanese language education, as well as to highlight the domains that have been studied most extensively. As part of the methodology, the Google Scholar search tool was employed to gather research results that included the terms “Japanese language” and “corpus.” Text mining was then performed using KH Coder on the titles of 1,117 research papers and books published between 1995 and 2024. A summary of the analytical results is provided below. [1] Text mining was used to extract high-frequency words from the titles of academic papers and books. Corpus linguistics is most commonly applied in the field of “Japanese language education,” with the terms “learner” (150 occurrences), “native language” (111 occurrences), and “Japanese language education” (59 occurrences) being among the most frequent. [2] The corpus is categorized into several types: “spoken corpus,” “written corpus,” “learner corpus,” “historical corpus,” and others. The frequency of word occurrences was analyzed in each category. As a result, research is being conducted across various domains, with particular emphasis on spoken corpus (268 occurrences) and written corpus (213 occurrences), where research activity is exceptionally robust. [3] “Hierarchical cluster analysis” and a “co-occurrence network” were conducted to examine the similarities among the top 100 extracted terms. Additionally, the year of publication was set as an external variable to confirm the trends and results of the corpus study over the past 30 years. Research has been conducted in the following sequence: a parallel corpus, a spoken corpus, a written corpus, and a Japanese learner corpus.

KEYWORDS

Corpus Linguistics, Research trends, Text mining, KH Coder, Google Scholar

KJCKorea
Journal Central

The Japanese Language Association of Korea 2025 KCI Impact Factor : 0.41

An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

The Japanese Language Association of Korea 2025 KCI Impact Factor : 0.41

An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (0)

REFERENCES (10) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.