본문 바로가기
  • Home

MSHC: A Multi-Scale Hierarchical Embedding Compression Framework Preserving Semantic Structure

  • Journal of Internet of Things and Convergence
  • Abbr : JKIOTS
  • 2026, 12(1), pp.75~101
  • Publisher : The Korea Internet of Things Society
  • Research Area : Engineering > Computer Science > Internet Information Processing
  • Received : January 8, 2026
  • Accepted : February 20, 2026
  • Published : February 28, 2026

Jaeyong Jung 1 Nak Hyun Jung 2

1서울과학종합대학원
2서울과학종합대학원대학교

Accredited

ABSTRACT

This study proposes the Multi-Scale Hierarchical Embedding Compression (MSHC) framework for on-device deployment of large language model (LLM) embeddings. Unlike previous studies focused on the trade-off between compression rate and performance preservation, this research introduces a new perspective: ‘designability of semantic structure after compression’. MSHC consists of a four-stage pipeline: Matryoshka Representation Learning (MRL), Soft-to-Hard Vector Quantization (VQ), Product Quantization (PQ), and Sparse Autoencoder (SAE), preserving semantic hierarchical structure at each stage. Experiments on Korean benchmarks (KLUE-STS, KLUE-NLI, NSMC) showed that applying MRL alone maintained 96.8% performance at 8x compression. Task-aware fine-tuning significantly restored the performance of the MSHC pipeline. Furthermore, experiments with hierarchical weighted search and coarse-to-fine search demonstrated that MSHC effectively preserves the appropriate semantic structure for multi-stage search pipelines. This study proposes shifting the evaluation criterion for embedding compression from ‘immediate performance’ to ‘semantic manipulability’.

Citation status

* References for papers published after 2024 are currently being built.

This paper was written with support from the National Research Foundation of Korea.