Detection of Malicious PDF based on Document Structure Features and Stream Object (문서 구조 및 스트림 오브젝트 분석을 통한 문서형 악성코드 탐지)

Ah Reum Kang (강아름); JEONG, YEONG SEOP (정영섭); Se Lyeong Kim (김세령); Jong-hyun Kim (김종현); Jiyoung Woo (우지영); Sunoh Choi (최선오)

doi:10.9708/jksci.2018.23.11.085

Detection of Malicious PDF based on Document Structure Features and Stream Object

Journal of The Korea Society of Computer and Information
Abbr : JKSCI
2018, 23(11), pp.85~93
DOI : 10.9708/jksci.2018.23.11.085
Publisher : The Korean Society Of Computer And Information
Research Area : Engineering > Computer Science
Received : October 1, 2018
Accepted : November 1, 2018
Published : November 30, 2018

Ah Reum Kang ¹, JEONG, YEONG SEOP ¹, Se Lyeong Kim ², Jong-hyun Kim ³, Jiyoung Woo ¹, Sunoh Choi ³

¹순천향대학교
²한국인터넷진흥원
³한국전자통신연구원

Accredited

ABSTRACT

In recent years, there has been an increasing number of ways to distribute document-based malicious code using vulnerabilities in document files. Because document type malware is not an executable file itself, it is easy to bypass existing security programs, so research on a model to detect it is necessary. In this study, we extract main features from the document structure and the JavaScript contained in the stream object In addition, when JavaScript is inserted, keywords with high occurrence frequency in malicious code such as function name, reserved word and the readable string in the script are extracted. Then, we generate a machine learning model that can distinguish between normal and malicious. In order to make it difficult to bypass, we try to achieve good performance in a black box type algorithm. For an experiment, a large amount of documents compared to previous studies is analyzed. Experimental results show 98.9% detection rate from three different type algorithms. SVM, which is a black box type algorithm and makes obfuscation difficult, shows much higher performance than in previous studies.

KEYWORDS

malware, PDF, machine learning, java script, detection

Citation status

* References for papers published after 2025 are currently being built.

[confproc] P. Laskov / 2011 / Static Detection of Malicious JavaScript-Bearing PDF Documents / Proceedings of the Annual Computer Security Applications Conference (ACSAC) : 373~382

[confproc] C. Smutz / 2012 / Malicious PDF Detection using Metadata and Structural Features / Proceedings of the 28th Annual Computer Security Applications Conference : 239~248

[confproc] N. Šrndic / 2013 / Detection of Malicious PDF Files Based on Hierarchical Document Structure / Proceedings of the 20th Annual Network & Distributed System Security Symposium : 1~16

[confproc] X. Lu / 2013 / De-obfuscation and Detection of Malicious PDF Files with High Accuracy / Proceedings of the 46th Hawaii International Conference on System Sciences (HICSS) : 4890~4899

[confproc] I. Corona / 2014 / Lux0r:Detection of Malicious PDF-embedded Javascript Code through Discriminant Analysis of API References / Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop : 47~57

[journal] N. Šrndić / 2016 / Hidost: a static machine-learn ing-based detector of malicious files / EURASIP Journal on Information Security 2016(1) : 22~

[confproc] M Li / 2017 / FEPDF: A Robust Feature Extractor for Malicious PDF Detection / Proceedings of BigDataSE/ICESS 2017 : 218~224

[journal] S. Khitan / 2017 / PDF Forensic Analysis System using YARA / International Journal of Computer Science and Network Security 17(5) : 77~85

[confproc] B. Cuan / 2018 / Malware Detection in PDF Files Using Machine Learning / SECRYPT 2018 - 15th International Conference on Security and Cryptography : 8~

[other] J. Zhang / 2018 / MLPdf: An Effective Machine Learning Based Approach for PDF Malware Detection / arXiv:1808.06991v1

[confproc] J. Torres / 2018 / Malicious PDF Documents Detection using Machine Learning Techniques / Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018) : 337~344

[journal] D. Maiorca / 2012 / A Pattern Recognition System for Malicious PDF Files Detection / LNCS 7326 : 510~524

[confproc] D. Liu / 2014 / Detecting Malicious Javascript in PDF through Document Instrumentation / Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

KJCKorea
Journal Central

Journal of The Korea Society of Computer and Information 2025 KCI Impact Factor : 1.01

Detection of Malicious PDF based on Document Structure Features and Stream Object

ABSTRACT

KEYWORDS

Citation status

* References for papers published after 2025 are currently being built.

Journal of The Korea Society of Computer and Information 2025 KCI Impact Factor : 1.01

Detection of Malicious PDF based on Document Structure Features and Stream Object

ABSTRACT

KEYWORDS

Statistics

Tools

Issue List

Citation status

KCI Citation Counts (5)

REFERENCES (13) * References for papers published after 2025 are currently being built.

Search PDF

Citation

* References for papers published after 2025 are currently being built.