본문 바로가기
  • Home

Anomalous Pattern Analysis of Large-Scale Logs with Spark Cluster Environment

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2024, 29(3), pp.127-136
  • DOI : 10.9708/jksci.2024.29.03.127
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : January 8, 2024
  • Accepted : February 28, 2024
  • Published : March 29, 2024

Sion Min 1 Youyang Kim 1 Byungchul Tak 1

1경북대학교

Accredited

ABSTRACT

This study explores the correlation between system anomalies and large-scale logs within the Spark cluster environment. While research on anomaly detection using logs is growing, there remains a limitation in adequately leveraging logs from various components of the cluster and considering the relationship between anomalies and the system. Therefore, this paper analyzes the distribution of normal and abnormal logs and explores the potential for anomaly detection based on the occurrence of log templates. By employing Hadoop and Spark, normal and abnormal log data are generated, and through t-SNE and K-means clustering, templates of abnormal logs in anomalous situations are identified to comprehend anomalies. Ultimately, unique log templates occurring only during abnormal situations are identified, thereby presenting the potential for anomaly detection.

Citation status

* References for papers published after 2022 are currently being built.

This paper was written with support from the National Research Foundation of Korea.