Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words (신조어의 의미 학습을 위한 딥러닝 기반 표적 마스킹 기법)

Gun-Min Nam (남건민); Namgyu Kim (김남규)

doi:10.9708/jksci.2021.26.10.157

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2021, 26(10), pp.157~165
DOI : 10.9708/jksci.2021.26.10.157
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : September 13, 2021
Accepted : October 6, 2021
Published : October 29, 2021

Gun-Min Nam ¹, Namgyu Kim ¹

¹국민대학교

Accredited

ABSTRACT

Recently, studies using deep learning to analyze a large amount of text are being actively conducted. In particular, a pre-trained language model that applies the learning results of a large amount of text to the analysis of a specific domain text is attracting attention. Among various pre-trained language models, BERT(Bidirectional Encoder Representations from Transformers)-based model is the most widely used. Recently, research to improve the performance of analysis is being conducted through further pre-training using BERT's MLM(Masked Language Model). However, the traditional MLM has difficulties in clearly understands the meaning of sentences containing new words such as newly coined words. Therefore, in this study, we newly propose NTM(Newly coined words Target Masking), which performs masking only on new words. As a result of analyzing about 700,000 movie reviews of portal 'N' by applying the proposed methodology, it was confirmed that the proposed NTM showed superior performance in terms of accuracy of sensitivity analysis compared to the existing random masking.

KEYWORDS

Target Masking, Deep Learning, BERT, Newly Coined Words, Sentiment Analysis

Citation status

* References for papers published after 2024 are currently being built.

[confproc] A. Tan / 1999 / Text Mining: The State of the Art and the Challenges / Proceedings of the PAKDD Workshop on Knowledge Discovery from Advanced Databases : 65~70

[journal] B. Gretarsson / 2012 / TopicNets : Visual Analysis of Large Text Corpora with Topic Modeling / ACM Transactions on Intelligent Systems and Technology 3(2) : 1~26

[journal] 김무성 / 2021 / Text Augmentation Using Hierarchy-based Word Replacement / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 26(1) : 57~67

[confproc] W. Gao / 2012 / Joint Topic Modeling for Event Summarization Across News and Social Media Streams / Proceedings of the 21st ACM International Conference on Information and Knowledge Management : 1173~1182

[journal] B. Liu / 2012 / Sentiment Analysis and Opinion Mining / Synthesis Lectures on Human Language Technologies 5(1) : 1~167

[confproc] Q. Le / 2014 / Distributed Representations of Sentences and Documents / Proceedings of the 31st International Conference on Machine Learning, Vol. 32 : 1188~1196

[confproc] M. Peter / 2018 / Deep Contextualized Word Representations / Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics 1 : 2227~2237

[other] J. Devlin / 2018 / BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding / arXiv:1810.04805

[web] / SKTBrain / https://github.com/SKTBrain/KoBERT

[confproc] T. Mikolov / 2013 / Efficient Estimation of Word Representations in Vector Space / Proceedings of the International Conference on Learning Representations / ICLR

[journal] P. Bojanowski / 2017 / Enriching Word Vectors with Subword Information / Transactions of the Association for Computational Linguistics 5 : 135~146

[confproc] T. Mikolov / 2010 / Recurrent Neural Network Based Language Model / Eleventh Annual Conference of the International Speech Communication Association

[journal] S. Hochreiter / 1997 / Long Short-Term Memory / Journal of Neural Computation 9(8) : 1735~1780

[other] J. Chung / 2014 / Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling / arXiv:1412.3555

[other] D. Bahdanau / 2014 / Neural Machine Translation by Jointly Learning to Align and Translate / arXiv:1409.0473

[confproc] A. Vaswani / 2017 / Attention Is All You Need / Proceedings of the 31st International Conference on Neural Information Processing Systems : 6000~6010

[journal] 윤여일 / 2020 / Self-Supervised Document Representation Method / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 25(5) : 187~197

[other] A. Adhikari / 2019 / DocBERT : BERT for Document Classification / arXiv:1904.08398

[other] D. Araci / 2019 / FinBERT : Financial Sentiment Analysis with Pre-trained Language Models / arXiv:1908.10063

[other] V. D. Viellieber / 2020 / Pre-trained Language Models as Knowledge Bases for Automotive Complaint Analysis / arXiv:2012.02558

[confproc] C. Sung / 2019 / Pre-training BERT on Domain Resources for Short Answer Grading / Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing : 6071~6075

[confproc] Y. Gu / 2020 / Train No Evil : Selective Masking for Task-guided Pre-training / Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing : 6966~6974

[web] / Wikipedia / https://ko.wikipedia.org/wiki/대한민국의_인터넷_신조어_목록

[web] / Naver Blog / https://blog.naver.com/PostView.nhn?blogId=maryjane1440&logNo=221521383120

[other] Y. Kim / 2014 / Convolutional Neural Networks for Sentence Classification / arXiv:1408.5882

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (1)

REFERENCES (25) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.