본문 바로가기
  • Home

An Implementation of TF-IDF Feature Extraction and Machine Learning Based Web Attack Detection System

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2026, 31(1), pp.109~120
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : November 10, 2025
  • Accepted : December 30, 2025
  • Published : January 30, 2026

Eun ji Song 1 Seong-Cho Hong 1 Ah Reum Kang 1

1배재대학교

Accredited

ABSTRACT

With the rapid proliferation of web applications, HTTP-based cyberattacks continue to rise, underscoring the need for effective web attack detection systems. This study proposes and evaluates a detection system that combines TF-IDF feature extraction with multiple machine-learning classifiers. Treating HTTP request data as text, we apply natural language processing techniques and assess Logistic Regression, Random Forest, and XGBoost using the CSIC 2010 HTTP dataset. Experiments show that XGBoost achieves the best performance with 98.77% accuracy, 0.994 ROC AUC, PR AUC, while Random Forest and Logistic Regression attain accuracies of 97.50% and 97.83%, respectively. All models deliver precision above 96%, demonstrating their viability for deployment in real-world environments. The results indicate that interpretable machine-learning approaches can achieve competitive performance without resorting to complex deep learning models.

Citation status

* References for papers published after 2024 are currently being built.