본문 바로가기
  • Home

Missing Value Imputation Technique for Water Quality Dataset

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2024, 29(4), pp.39-46
  • DOI : 10.9708/jksci.2024.29.04.039
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : February 23, 2024
  • Accepted : March 21, 2024
  • Published : April 30, 2024

Jin-Young Jun 1 Youn-A Min 1

1한양사이버대학교

Accredited

ABSTRACT

Many researchers make efforts to evaluate water quality using various models. Such models require a dataset without missing values, but in real world, most datasets include missing values for various reasons. Simple deletion of samples having missing value(s) could distort distribution of the underlying data and pose a significant risk of biasing the model’s inference when the missing mechanism is not MCAR. In this study, to explore the most appropriate technique for handing missing values in water quality data, several imputation techniques were experimented based on existing KNN and MICE imputation with/without the generative neural network model, Autoencoder(AE) and Denoising Autoencoder(DAE). The results shows that KNN and MICE combined imputation without generative networks provides the closest estimated values to the true values. When evaluating binary classification models based on support vector machine and ensemble algorithms after applying the combined imputation technique to the observed water quality dataset with missing values, it shows better performance in terms of Accuracy, F1 score, RoC-AuC score and MCC compared to those evaluated after deleting samples having missing values.

Citation status

* References for papers published after 2023 are currently being built.