본문 바로가기
  • Home

Performance Comparison and SHAP Interpretation of Movie Box Office Prediction Models Based on CatBoost and PyCaret

  • Journal of Internet of Things and Convergence
  • Abbr : JKIOTS
  • 2024, 10(5), pp.213-226
  • Publisher : The Korea Internet of Things Society
  • Research Area : Engineering > Computer Science > Internet Information Processing
  • Received : September 29, 2024
  • Accepted : October 14, 2024
  • Published : October 31, 2024

Huiseong Kim 1 Jihoon Moon 2

1순천향대학교 AI·빅데이터학과
2순천향대학교

Accredited

ABSTRACT

This study uses box office data collected by the Korean Film Council (KOFIC) to develop and compare predictive models for cinema attendance and revenue. Data preprocessing removed irrelevant variables and handled missing values separately for categorical and numerical data to ensure consistency. Exploratory data analysis identified key variables, including Seoul audience size, revenue, total number of screens, film genre, rating, and month of release, which revealed a strong correlation between Seoul audience size and revenue with box office performance. Based on this analysis, predictive models were developed using CatBoost and PyCaret AutoML. CatBoost was chosen for its effectiveness in handling categorical variables such as director name, production company, and genre, while PyCaret AutoML was chosen for its ability to automate the modeling process, making it easy for non-experts to compare different models. The performance of the models was evaluated using mean absolute error (MAE), root mean squared error (RMSE), and R-squared (R²), with CatBoost demonstrating superior accuracy. In addition, the SHAP technique was used to interpret the models, identifying Seoul's audience size and revenue as the most significant predictors. This research presents reliable box office prediction models that will improve decision-making in the film industry and support the development of data-driven strategies.

Citation status

* References for papers published after 2023 are currently being built.