@article{ART002138078},
author={Choi.Jun-Hyeog and Sung-Hae Jun},
title={Big Data Smoothing and Outlier Removal for Patent Big Data Analysis},
journal={Journal of The Korea Society of Computer and Information},
issn={1598-849X},
year={2016},
volume={21},
number={8},
pages={77-84}
TY - JOUR
AU - Choi.Jun-Hyeog
AU - Sung-Hae Jun
TI - Big Data Smoothing and Outlier Removal for Patent Big Data Analysis
JO - Journal of The Korea Society of Computer and Information
PY - 2016
VL - 21
IS - 8
PB - The Korean Society Of Computer And Information
SP - 77
EP - 84
SN - 1598-849X
AB - In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis.
The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.
KW - Patent big data;Smoothing;Box-plot;Noise;Outlier;Statistical analysis
DO -
UR -
ER -
Choi.Jun-Hyeog and Sung-Hae Jun. (2016). Big Data Smoothing and Outlier Removal for Patent Big Data Analysis. Journal of The Korea Society of Computer and Information, 21(8), 77-84.
Choi.Jun-Hyeog and Sung-Hae Jun. 2016, "Big Data Smoothing and Outlier Removal for Patent Big Data Analysis", Journal of The Korea Society of Computer and Information, vol.21, no.8 pp.77-84.
Choi.Jun-Hyeog, Sung-Hae Jun "Big Data Smoothing and Outlier Removal for Patent Big Data Analysis" Journal of The Korea Society of Computer and Information 21.8 pp.77-84 (2016) : 77.
Choi.Jun-Hyeog, Sung-Hae Jun. Big Data Smoothing and Outlier Removal for Patent Big Data Analysis. 2016; 21(8), 77-84.
Choi.Jun-Hyeog and Sung-Hae Jun. "Big Data Smoothing and Outlier Removal for Patent Big Data Analysis" Journal of The Korea Society of Computer and Information 21, no.8 (2016) : 77-84.
Choi.Jun-Hyeog; Sung-Hae Jun. Big Data Smoothing and Outlier Removal for Patent Big Data Analysis. Journal of The Korea Society of Computer and Information, 21(8), 77-84.
Choi.Jun-Hyeog; Sung-Hae Jun. Big Data Smoothing and Outlier Removal for Patent Big Data Analysis. Journal of The Korea Society of Computer and Information. 2016; 21(8) 77-84.
Choi.Jun-Hyeog, Sung-Hae Jun. Big Data Smoothing and Outlier Removal for Patent Big Data Analysis. 2016; 21(8), 77-84.
Choi.Jun-Hyeog and Sung-Hae Jun. "Big Data Smoothing and Outlier Removal for Patent Big Data Analysis" Journal of The Korea Society of Computer and Information 21, no.8 (2016) : 77-84.