본문 바로가기
  • Home

Resource Sharing Scheme for Continuous Aggregation Query Processing in Spark Frameworks

  • Journal of Knowledge Information Technology and Systems
  • Abbr : JKITS
  • 2020, 15(5), pp.671-681
  • DOI : 10.34163/jkits.2020.15.5.010
  • Publisher : Korea Knowledge Information Technology Society
  • Research Area : Interdisciplinary Studies > Interdisciplinary Research
  • Received : July 30, 2020
  • Accepted : October 13, 2020
  • Published : October 31, 2020

WeoniI Jeong 1

1호서대학교

Accredited

ABSTRACT

Recently, as the real-time data generated by sensors in connection with location information rapidly increases in the IoT environment, studies on various services using a big data platform have been conducted. In this big data platform, research on continuous query processing including data aggregation operation has been proposed to effectively analyze sensor big data constantly flowing for multi-user needs. However, the existing resource-sharing-based aggregation operation has a problem of concurrent execution of time-based and tuple-based queries and an increase in memory usage and deterioration in query processing performance due to duplicated maintenance of aggregate information. Therefore, in this paper, we propose a resource sharing method to effectively support continuous aggregation query processing based on Spark, an in-memory based distributed processing framework. The proposed method minimizes the increase in the cost of processing aggregate information through linear resource sharing based on summary information for the scope of query processing, and reduces the memory usage required for continuous aggregate query processing. Also, our method improves query processing performance by preventing duplicate generation of aggregate information. The proposed approach is implemented based on Spark framework to ensure real-time performance of continuous aggregate query processing for big data. Finally, through performance evaluation, it is shown that the proposed resource sharing technique can be effectively used for continuous aggregation query processing.

Citation status

* References for papers published after 2023 are currently being built.