본문 바로가기
  • Home

Two-steps Data Quality Assessment Methodology for Handling Drift of Machine Learning

  • Journal of Software Assessment and Valuation
  • Abbr : JSAV
  • 2024, 20(1), pp.75-85
  • Publisher : Korea Software Assessment and Valuation Society
  • Research Area : Engineering > Computer Science
  • Received : March 5, 2024
  • Accepted : March 20, 2024
  • Published : March 31, 2024

Okjoo Choi 1 Yukyong Kim 2

1배재대학교
2숙명여자대학교

Accredited

ABSTRACT

Data quality of data-based information technologies such as big data analysis and machine learning directly affects the quality of the entire system. In particular, the properties of the data used to train machine learning models change over time, causing the model to become less accurate or behave differently than it was designed to. This phenomenon is called drift. Drift can occur for a variety of reasons, including data collection issues or market volatility. Data drift is difficult to detect immediately and can lead to inaccurate predictions, compromising business decisions based on it. The actions required to manage drift will depend on the type, extent, and nature of the drift. To take appropriate action, it is important to establish repeatable procedures for identifying drift, controlling and assessing data quality, setting thresholds for drift rates, and configuring proactive warnings. In this paper, we propose a two-step data quality assessment framework that can manage drift problems that occur in machine learning projects through data quality assessment indicators. In addition, evaluation indices and evaluation procedures according to drift type for drift detection are also defined.

Citation status

* References for papers published after 2023 are currently being built.