@article{ART002869020},
author={KHOO HYUN AH},
title={A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book},
journal={The Journal of Study on Language and Culture of Korea and China},
issn={1738-0502},
year={2022},
number={65},
pages={43-84},
doi={10.16874/jslckc.2022..65.002}
TY - JOUR
AU - KHOO HYUN AH
TI - A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book
JO - The Journal of Study on Language and Culture of Korea and China
PY - 2022
VL - null
IS - 65
PB - Korean Society of Study on Chinese Languge and Culture
SP - 43
EP - 84
SN - 1738-0502
AB - Digitalization has been introduced and used in many fields in the information age. But digitalization of ancient Chinese characters in Korea is still at an elementary stage. The development of OCR for ancient Chinese characters in Korea was first attempted by the government in 2009. After that, a large-scale systematic project of 10 million character in 2020, but this project also has a limitation in that most of it collected books published in woodblock prints. Therefore, this study explored data collection methods for the establishment of OCR for ancient Chinese characters and the use of OCR of old Chinese characters. First, in order to build OCR of ancient Chinese characters with high accuracy, various typeface data must be collected. And since old books are mostly written in the square style of Chinese handwriting or the semicursive style of writing, they must expand the types of source data. Such as calligraphy, art works, and household goods. Various fonts can be collected based on printing tools. Such as metal type, wood type, and woodblock print. The diversity of data can be secured by allowing the font to include different types and blco books. In addition, it is essential to collect Chinese rhyme book, Okpyeon, and dictionaries to include many different Chinese characters. For example, 『Hongmu Jeongwun yeokhun』, 『Sasungtonghae』, 『Samwunsunghui』, 『Gyujangjeonwun』, etc.
The results of ancient Chinese characters OCR can be used in translation, digital archive construction, font development, tourism industry. Author and publication period can be ascertained through font recognition. And it can be used in preservation studies, for example, the degree of damage can be proven by the original image.
OCR, an ancient Chinese character, is a very important record heritage containing Korean culture. Digitalization of heritage will accelerate the development of Korean humanities and contribute to the development of academic and industrial fields. And it can create various jobs through the production of new contents. It is hoped that the results of this study will help develop ancient Chinese character recognition OCR with higher accuracy and usability in the future.
KW - OCR;Chinese character;ancient book;the style of writing Chinese characters;metal type;wood block;translation;archive;font
DO - 10.16874/jslckc.2022..65.002
ER -
KHOO HYUN AH. (2022). A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book. The Journal of Study on Language and Culture of Korea and China, 65, 43-84.
KHOO HYUN AH. 2022, "A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book", The Journal of Study on Language and Culture of Korea and China, no.65, pp.43-84. Available from: doi:10.16874/jslckc.2022..65.002
KHOO HYUN AH "A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book" The Journal of Study on Language and Culture of Korea and China 65 pp.43-84 (2022) : 43.
KHOO HYUN AH. A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book. 2022; 65 : 43-84. Available from: doi:10.16874/jslckc.2022..65.002
KHOO HYUN AH. "A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book" The Journal of Study on Language and Culture of Korea and China no.65(2022) : 43-84.doi: 10.16874/jslckc.2022..65.002
KHOO HYUN AH. A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book. The Journal of Study on Language and Culture of Korea and China, 65, 43-84. doi: 10.16874/jslckc.2022..65.002
KHOO HYUN AH. A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book. The Journal of Study on Language and Culture of Korea and China. 2022; 65 43-84. doi: 10.16874/jslckc.2022..65.002
KHOO HYUN AH. A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book. 2022; 65 : 43-84. Available from: doi:10.16874/jslckc.2022..65.002
KHOO HYUN AH. "A Study on methods of Data Collection and application for Optical Character Recognition of Chinese Character in ancient book" The Journal of Study on Language and Culture of Korea and China no.65(2022) : 43-84.doi: 10.16874/jslckc.2022..65.002