본문 바로가기
  • Home

HAL: An Encoding Method for Efficient Log Data Pipelines

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2026, 31(6), pp.1~15
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : April 23, 2026
  • Accepted : May 21, 2026
  • Published : June 30, 2026

Min-gi Lee 1 Choong-hee Cho 1

1삼육대학교

Accredited

ABSTRACT

In recent large-scale service environments, data pipelines are widely used to collect and transmit web logs in real time. In this process, even when serialization methods such as Avro are applied, repetitive values remain in the data, leading to increased network transmission costs. To address this limitation, this study proposes HAL (Hybrid Analytics Log), a lightweight encoding method that can be integrated into existing systems without significant structural modifications. The proposed approach replaces frequently repeated string combinations with integer identifiers, while preserving unmatched data in its original form to ensure lossless processing. Experimental results using Nginx access logs show that the proposed method reduces Kafka message size by up to 16.2%, Kafka payload bandwidth by 17%, and data ingestion time by 24.6%, without any significant degradation in total processing time or throughput. In addition, the Parquet storage size is further reduced by 4.8% in the final storage stage. These results demonstrate that the proposed approach effectively improves both transmission and storage efficiency through low-cost computational operations.

Citation status

* References for papers published after 2024 are currently being built.