Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

PARK HO YEON (박호연); Kim, Kyoung Jae (김경재)

doi:10.9708/jksci.2020.25.08.181

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2020, 25(8), pp.181~188
DOI : 10.9708/jksci.2020.25.08.181
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : July 28, 2020
Accepted : August 11, 2020
Published : August 31, 2020

PARK HO YEON ¹, Kim, Kyoung Jae ¹

¹동국대학교

Accredited

ABSTRACT

In this study, we propose a comparative study to confirm the impact of various word embedding techniques on the performance of sentiment analysis. Sentiment analysis is one of opinion mining techniques to identify and extract subjective information from text using natural language processing and can be used to classify the sentiment of product reviews or comments. Since sentiment can be classified as either positive or negative, it can be considered one of the general classification problems. For sentiment analysis, the text must be converted into a language that can be recognized by a computer. Therefore, text such as a word or document is transformed into a vector in natural language processing called word embedding. Various techniques, such as Bag of Words, TF-IDF, and Word2Vec are used as word embedding techniques. Until now, there have not been many studies on word embedding techniques suitable for emotional analysis. In this study, among various word embedding techniques, Bag of Words, TF-IDF, and Word2Vec are used to compare and analyze the performance of movie review sentiment analysis. The research data set for this study is the IMDB data set, which is widely used in text mining. As a result, it was found that the performance of TF-IDF and Bag of Words was superior to that of Word2Vec and TF-IDF performed better than Bag of Words, but the difference was not very significant.

KEYWORDS

sentiment analysis, Bag of words, TF-IDF, Word2Vec, machine learning

Citation status

* References for papers published after 2024 are currently being built.

[journal] T. A. Rana / 2016 / Aspect extraction in sentiment analysis : comparative analysis and survey / Artificial Intelligence Review 46(4) : 459~483

[journal] Q. T. Ain / 2017 / Sentiment analysis using deep learning techniques : a review / International Journal of Advanced Computer Science and Applications 8(6) : 424~433

[journal] A. Abdi / 2019 / Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion / Information Processing & Management 56(4) : 1245~1259

[confproc] B. Pang / 2002 / Thumbs up? Sentiment classification using machine learning techniques / Proc. of EMNLP 2002 : 79~86

[journal] F. H. Khan / 2016 / SentiMI : Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection / Applied Soft Computing 39 : 140~153

[journal] F. Tang / 2019 / Aspect based fine-grained sentiment analysis for online reviews / Information Sciences 488 : 190~204

[journal] C. Bhadane / 2015 / Sentiment analysis : Measuring opinions / Procedia Computer Science 45(0) : 808~814

[confproc] T. Mikolov / 2013 / Distributed representations of words and phrases and their compositionality / Advances in Neural Information Processing Systems : 3111~3119

[journal] 김우주 / 2016 / Semantic Extention Search for Documents Using the Word2vec / 한국콘텐츠학회 논문지 / 한국콘텐츠학회 16(10) : 687~692

[journal] 성대경 / 2018 / Political Opinion Mining from Article Comments using Deep Learning / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 23(1) : 9~15

[journal] 이태일 / 2019 / An Efficient BotNet Detection Scheme Exploiting Word2Vec and Accelerated Hierarchical Density-based Clustering / 인터넷정보학회논문지 / 한국인터넷정보학회 20(6) : 11~20

[journal] 김유희 / 2018 / A Deeping Learning-based Article- and Paragraph-level Classification / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 23(11) : 31~41

[journal] 박진규 / 2018 / Structuring of Unstructured SNS Messages on Rail Services using Deep Learning Techniques / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 23(7) : 19~26

[journal] S. M. Liu / 2015 / A multi-label classification based approach for sentiment classification / Expert Systems with Applications 42(3) : 1083~1093

[confproc] G. Gautam / 2014 / Sentiment analysis of twitter data using machine learning approaches and semantic analysis / Proc. of IC3, IEEE : 437~442

[confproc] J. Read / 2005 / Using emoticons to reduce dependency in machine learning techniques for sentiment classification / Proceedings of the ACL Student Research Workshop : 43~48

[journal] L. Dey / 2016 / Sentiment analysis of review datasets using Naive Bayes and k-nn classifier / International Journal of Information Engineering and Electronic Business 8(4) : 54~62

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2024 are currently being built.

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (1)

REFERENCES (17) * References for papers published after 2024 are currently being built.

Search PDF

Citation

* References for papers published after 2024 are currently being built.