본문 바로가기
  • Home

Validation of Difficulty Classification Concordance using GPT-4o for Physical Therapy Exam Questions

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2025, 30(8), pp.233~247
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : July 29, 2025
  • Accepted : August 19, 2025
  • Published : August 29, 2025

Wansuk Choi 1 TaeSeok Choi 2 HeeJoon Shin 1 Hongrae Kim 1 Ma Xu 1 Do Heon Kwon 1 Jin Yuemei 1 Myeong-Chul Park 1 Seoyoon Heo 3

1경운대학교
2군장대학교
3경복대학교

Accredited

ABSTRACT

In this paper, we evaluated GPT-4o's validity for classifying physical therapy examination question difficulty compared to human expert assessments. A multi-institutional cross-sectional validation study was conducted across three South Korean universities with 180 physical therapy professionals (11 educators, 169 students) evaluating 525 questions previously classified by GPT-4o into five difficulty levels. Participants rated question difficulty using a 5-point Likert scale. GPT-4o classifications demonstrated exceptional correlation with human assessments (r = 0.988, p < 0.001), explaining 97.6% of variance in human ratings. Bland-Altman analysis revealed minimal systematic bias (mean difference = -0.233). Inter-rater reliability was excellent for educators (ICC = 0.912) and students (ICC = 0.908), with no significant institutional differences (p = 0.794). These findings support the use of GPT-4o as a reliable tool for educational assessment in physical therapy programs, with broad applicability for curriculum development and examination design.

Citation status

* References for papers published after 2024 are currently being built.