Severity-based Software Quality Prediction using Class Imbalanced Data (클래스 불균형 데이터를 사용한 심각도 기반 소프트웨어 품질 예측)

Hong Euyseok (홍의석); 박미경

Severity-based Software Quality Prediction using Class Imbalanced Data

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2016, 21(4), pp.73~80
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science

Hong Euyseok ¹, 박미경 ¹

¹성신여자대학교

Accredited

ABSTRACT

Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.

KEYWORDS

Data imbalance, Fault prediction, Severity, Sampling

Citation status

* References for papers published after 2024 are currently being built.

[journal] C. Catal / 2011 / Software fault prediction : A literature review and current trends / Expert Systems with Applications 38(4) : 4626~4636

[journal] R. Malhotra / 2015 / A systematic review of machine learning techniques for software fault prediction / Applied Soft. Computing 27 : 504~518

[journal] D. E. Harter / 2012 / Does Software Process Improvement Reduce the Severity of Defects? A Longitudinal Field Study / IEEE Trans. Software Eng 38(4) : 810~827

[journal] Y. Zhou / 2006 / Empirical analysis of objectoriented design metrics for predicting high and low severity faults / IEEE Trans. Software Eng 32(10) : 771~789

[journal] 홍의석 / 2015 / Software Quality Prediction based on Defect Severity / 한국컴퓨터정보학회논문지 / 한국컴퓨터정보학회 20(5) : 73~81

[journal] 홍의석 / 2013 / Ambiguity Analysis of Defectiveness in NASA MDP Data Sets / 한국IT서비스학회지 / 한국IT서비스학회 12(2) : 361~371

[journal] 홍의석 / 2014 / Unsupervised Learning Model for Fault Prediction Using Representative Clustering Algorithms / 정보처리학회논문지. 소프트웨어 및 데이터 공학 / 한국정보처리학회 3(2) : 57~64

[journal] Y. Zhou / 2006 / Empirical analysis of objectoriented design metrics for predicting high and low severity faults / IEEE Trans. Software Eng 32(10) : 771~789

[journal] Y. Singh / 2010 / Empirical validation of object-oriented metrics for predicting fault proneness models / Software Quality Journal 18 : 3~35

[confproc] Y. Kamei / 2007 / The Effects of Over and Under Sampling on Fault-prone Module Detection / proc. ESEM : 196~204

[journal] Y. Jiang / 2011 / Software defect detection with ROCUS / Journal of Computer Science and Technology 26(2) : 328~342

[journal] M. Li / 2012 / Sample based software defect prediction with active and semi-supervised learning / Automated Software Engineering 19(2) : 201~230

[journal] S. Wang / 2013 / Using class imbalance learning for software defect prediction / IEEE Trans. Reliability 62(2) : 434~443

[web] / WEKA (Waikato Environment for Knowledge Analysis) / http://www.cs.waikato.ac.nz/~ml/weka/

[journal] T. Fawcett / 2006 / An introduction to ROC analysis / Pattern recognition letters 27(8) : 861~874

[journal] N. V. Chawla / 2002 / SMOTE : synthetic minority oversampling technique / Journal of Artificial Intelligence Research 16(1) : 321~357

[journal] L. Rokach / 2005 / Top-Down Induction of Decision Trees Classifiers-A Survey / IEEE Trans. Systems, Man, and Cybernetics, Part C 35(4) : 476~487

This paper was written with support from the National Research Foundation of Korea.

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2024 KCI Impact Factor : 0.81