Objectives Efforts to develop a Sasang type diagnostic method with high validity and clinical utility have continued, with growing interest in large-scale data and machine learning (ML). However, ML often suffers from reliance on input data, overfitting, and limited theoretical interpretability, reducing its clinical applicability. This study aimed to develop a diagnostic model with improved interpretability, predictive performance, and accessibility by systematically selecting clinical features and using an intuitive ML interface.
Methods Psychological, physical, and hematological features were selected stepwise through literature review, ANCOVA, and correlation analysis. Data from 2,407 participants were split into training (70%) and test(30%) sets. Feature contribution was evaluated by information gain (IG), and the diagnostic performance of random forest (RF), naïve Bayes (NB), neural network (NN), and stochastic gradient descent (SGD) was assessed using accuracy, precision, recall, F1 score, Matthews correlation coefficient (MCC), ROC curve and AUC. Analyses were performed using Orange, an open-source Python-based Graphic User Interface.
Results IG analysis showed notable contributions from BMI (0.531), RMRw (0.292), RMR (0.207), three SPQ subscales (0.329).
Including BMI yielded high accuracy (0.817), but after its removal to balance features, accuracy remained at 0.755.
Algorithm efficiency depended on interaction with input data.
Conclusions Balanced integration of psychological, physical, and hematological features improved consistency with Sasang typology theory, clinical utility, and model generalizability. Appropriate feature selection had a greater impact on performance than algorithm choice. The proposed procedure and Orange workflow offer clinicians and researchers a practical foundation for ML-based analysis, contributing to the globalization of Sasang typology.