@article{ART003244561},
author={Jang, Kun-Soo},
title={An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method},
journal={The Japanese Language Association of Korea},
issn={1229-7275},
year={2025},
number={85},
pages={51-68}
TY - JOUR
AU - Jang, Kun-Soo
TI - An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method
JO - The Japanese Language Association of Korea
PY - 2025
VL - null
IS - 85
PB - The Japanese Language Association Of Korea
SP - 51
EP - 68
SN - 1229-7275
AB - This study uses text mining to identify and describe the trends and outcomes in the field of “Japanese corpus linguistics.” It specifically aims to clarify when corpora were first utilized within Japanese linguistics and Japanese language education, as well as to highlight the domains that have been studied most extensively. As part of the methodology, the Google Scholar search tool was employed to gather research results that included the terms “Japanese language” and “corpus.” Text mining was then performed using KH Coder on the titles of 1,117 research papers and books published between 1995 and 2024. A summary of the analytical results is provided below.
[1] Text mining was used to extract high-frequency words from the titles of academic papers and books. Corpus linguistics is most commonly applied in the field of “Japanese language education,” with the terms “learner” (150 occurrences), “native language” (111 occurrences), and “Japanese language education” (59 occurrences) being among the most frequent.
[2] The corpus is categorized into several types: “spoken corpus,” “written corpus,” “learner corpus,” “historical corpus,” and others. The frequency of word occurrences was analyzed in each category. As a result, research is being conducted across various domains, with particular emphasis on spoken corpus (268 occurrences) and written corpus (213 occurrences), where research activity is exceptionally robust.
[3] “Hierarchical cluster analysis” and a “co-occurrence network” were conducted to examine the similarities among the top 100 extracted terms. Additionally, the year of publication was set as an external variable to confirm the trends and results of the corpus study over the past 30 years. Research has been conducted in the following sequence: a parallel corpus, a spoken corpus, a written corpus, and a Japanese learner corpus.
KW - Corpus Linguistics;Research trends;Text mining;KH Coder;Google Scholar
DO -
UR -
ER -
Jang, Kun-Soo. (2025). An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method. The Japanese Language Association of Korea, 85, 51-68.
Jang, Kun-Soo. 2025, "An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method", The Japanese Language Association of Korea, no.85, pp.51-68.
Jang, Kun-Soo "An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method" The Japanese Language Association of Korea 85 pp.51-68 (2025) : 51.
Jang, Kun-Soo. An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method. 2025; 85 : 51-68.
Jang, Kun-Soo. "An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method" The Japanese Language Association of Korea no.85(2025) : 51-68.
Jang, Kun-Soo. An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method. The Japanese Language Association of Korea, 85, 51-68.
Jang, Kun-Soo. An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method. The Japanese Language Association of Korea. 2025; 85 51-68.
Jang, Kun-Soo. An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method. 2025; 85 : 51-68.
Jang, Kun-Soo. "An Analysis of Trends and Achievements in Corpus Linguistics: Using the Text Mining Method" The Japanese Language Association of Korea no.85(2025) : 51-68.