@article{ART003349260},
author={Min-gi Lee and Choong-hee Cho},
title={HAL: An Encoding Method for Efficient Log Data Pipelines},
journal={Journal of The Korea Society of Computer and Information},
issn={1598-849X},
year={2026},
volume={31},
number={6},
pages={1-15}
TY - JOUR
AU - Min-gi Lee
AU - Choong-hee Cho
TI - HAL: An Encoding Method for Efficient Log Data Pipelines
JO - Journal of The Korea Society of Computer and Information
PY - 2026
VL - 31
IS - 6
PB - The Korean Society Of Computer And Information
SP - 1
EP - 15
SN - 1598-849X
AB - In recent large-scale service environments, data pipelines are widely used to collect and transmit web logs in real time. In this process, even when serialization methods such as Avro are applied, repetitive values remain in the data, leading to increased network transmission costs. To address this limitation, this study proposes HAL (Hybrid Analytics Log), a lightweight encoding method that can be integrated into existing systems without significant structural modifications. The proposed approach replaces frequently repeated string combinations with integer identifiers, while preserving unmatched data in its original form to ensure lossless processing. Experimental results using Nginx access logs show that the proposed method reduces Kafka message size by up to 16.2%, Kafka payload bandwidth by 17%, and data ingestion time by 24.6%, without any significant degradation in total processing time or throughput.
In addition, the Parquet storage size is further reduced by 4.8% in the final storage stage. These results demonstrate that the proposed approach effectively improves both transmission and storage efficiency through low-cost computational operations.
KW - Data pipeline;log processing;serialization;data compression;network efficiency;Kafka
DO -
UR -
ER -
Min-gi Lee and Choong-hee Cho. (2026). HAL: An Encoding Method for Efficient Log Data Pipelines. Journal of The Korea Society of Computer and Information, 31(6), 1-15.
Min-gi Lee and Choong-hee Cho. 2026, "HAL: An Encoding Method for Efficient Log Data Pipelines", Journal of The Korea Society of Computer and Information, vol.31, no.6 pp.1-15.
Min-gi Lee, Choong-hee Cho "HAL: An Encoding Method for Efficient Log Data Pipelines" Journal of The Korea Society of Computer and Information 31.6 pp.1-15 (2026) : 1.
Min-gi Lee, Choong-hee Cho. HAL: An Encoding Method for Efficient Log Data Pipelines. 2026; 31(6), 1-15.
Min-gi Lee and Choong-hee Cho. "HAL: An Encoding Method for Efficient Log Data Pipelines" Journal of The Korea Society of Computer and Information 31, no.6 (2026) : 1-15.
Min-gi Lee; Choong-hee Cho. HAL: An Encoding Method for Efficient Log Data Pipelines. Journal of The Korea Society of Computer and Information, 31(6), 1-15.
Min-gi Lee; Choong-hee Cho. HAL: An Encoding Method for Efficient Log Data Pipelines. Journal of The Korea Society of Computer and Information. 2026; 31(6) 1-15.
Min-gi Lee, Choong-hee Cho. HAL: An Encoding Method for Efficient Log Data Pipelines. 2026; 31(6), 1-15.
Min-gi Lee and Choong-hee Cho. "HAL: An Encoding Method for Efficient Log Data Pipelines" Journal of The Korea Society of Computer and Information 31, no.6 (2026) : 1-15.