Efficient K-Anonymization Implementation with Apache Spark (스파크 분산 환경 기반 효율적인 K-익명화 기법 연구)

Tae-Su Kim (김태수); Kim Jong Wook (김종욱)

doi:10.9708/jksci.2018.23.11.017

Efficient K-Anonymization Implementation with Apache Spark

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2018, 23(11), pp.17~24
DOI : 10.9708/jksci.2018.23.11.017
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : August 31, 2018
Accepted : October 22, 2018
Published : November 30, 2018

Tae-Su Kim ¹, Kim Jong Wook ¹

¹상명대학교

Accredited

ABSTRACT

Today, we are living in the era of data and information. With the advent of Internet of Things (IoT), the popularity of social networking sites, and the development of mobile devices, a large amount of data is being produced in diverse areas. The collection of such data generated in various area is called big data. As the importance of big data grows, there has been a growing need to share big data containing information regarding an individual entity. As big data contains sensitive information about individuals, directly releasing it for public use may violate existing privacy requirements. Thus, privacy-preserving data publishing (PPDP) has been actively studied to share big data containing personal information for public use, while preserving the privacy of the individual. K-anonymity, which is the most popular method in the area of PPDP, transforms each record in a table such that at least k records have the same values for the given quasi-identifier attributes, and thus each record is indistinguishable from other records in the same class. As the size of big data continuously getting larger, there is a growing demand for the method which can efficiently anonymize vast amount of dta. Thus, in this paper, we develop an efficient k-anonymity method by using Spark distributed framework. Experimental results show that, through the developed method, significant gains in processing time can be achieved.

KEYWORDS

K-anonymity, Spark, Hadoop, Distributed system, Data privacy

Citation status

* References for papers published after 2023 are currently being built.

[confproc] A. Narayanan / 2008 / Robust De-anonymi zation of Large Sparse Datasets / Proceedings of the 2008 IEEE Symposium on Security and Privacy Page

[journal] 김종선 / 2017 / Models for Privacy-preserving Data Publishing : A Survey / 정보과학회논문지 / 한국정보과학회 44(2) : 195~207

[journal] B.C.M. Fung / 2010 / Privacy-pres erving data publishing: A survey of recent developments / ACM Computing Surveys 42(4)

[journal] N. Mohammed / 2010 / Centralized and distributed anonymization for high-dimensional healthcare data / ACM Transactions on Knowledge Discovery from Data 4(4)

[journal] L. Sweeney / 2002 / k-anonymity: A model for protecting privacy / International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5) : 557~570

[journal] L. Sweeney / 2002 / Achieving k-Anonymity Privacy Protection using Generalization and Suppression / International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5) : 571~588

[confproc] K. LeFevre / 2005 / Incognito:Efficient full domain k-anonymity / Proceedings of the ACM SIGMOD International Conference on Management of Data

[confproc] J. Byun / 2007 / Efficient k-Anonym ization Using Clustering Technique / DASFAA 2007:Advances in Databases: Concepts, Systems and Applications : 188~200

[journal] S. Kim / 2017 / Privacy-preserving data cub for electronic medical records: An experimental evaluation / International Journal of medical Informatics

[web] / 2018 / Apache Hadoop 2.8.4 API docs / https://hadoop.apache.org/docs/r2.8.4/

[confproc] Jens Dittrich / 2012 / Efficient big data processing in Hadoop MapReduce / Proceedings of the VLDB Endowment 5(12) : 2014~2015

[web] / 2018 / Apache Spark 2.3.0 API docs / https://spark.apache.org/docs/2.3.0/index.html

[web] / 2012 / Health Insurance Review and Assessment Service in Korea / http://opendata.hira.or.kr

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2023 KCI Impact Factor : 0.65

Efficient K-Anonymization Implementation with Apache Spark

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2023 are currently being built.

Journal of The Korea Society of Computer and Information 2023 KCI Impact Factor : 0.65

Efficient K-Anonymization Implementation with Apache Spark

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (1)

REFERENCES (13) * References for papers published after 2023 are currently being built.

Search PDF

Citation

* References for papers published after 2023 are currently being built.