본문 바로가기
  • Home

A Comparative Analysis of the Pre-Processing in the Kaggle Titanic Competition

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2023, 28(3), pp.17-24
  • DOI : 10.9708/jksci.2023.28.03.017
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : January 20, 2023
  • Accepted : February 28, 2023
  • Published : March 31, 2023

Tai-Sung Hur 1 Suyoung Bang 1

1인하공업전문대학

Accredited

ABSTRACT

Based on the problem of 'Tatanic – Machine Learning from Disaster', a representative competition of Kaggle that presents challenges related to data science and solves them, we want to see how data preprocessing and model construction affect prediction accuracy and score. We compare and analyze the features by selecting seven top-ranked solutions with high scores, except when using redundant models or ensemble techniques. It was confirmed that most of the pretreatment has unique and differentiated characteristics, and although the pretreatment process was almost the same, there were differences in scores depending on the type of model. The comparative analysis study in this paper is expected to help participants in the kaggle competition and data science beginners by understanding the characteristics and analysis flow of the preprocessing methods of the top score participants.

Citation status

* References for papers published after 2023 are currently being built.