Research Of Malicious PDF Document Detection Technology

Posted on:2018-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:D Feng

Full Text:PDF

GTID:2348330542490834

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,electronic document transmission for the purpose of information exchange and data distribution is becoming more and more frequent.Portable Document Format(PDF)has become the de facto standard of global electronic document transmission format.But at the same time as PDF documents easy to spread,easy to expand and other features,PDF documents has become a network attacker to implement an important carrier of malicious attacks.At present,more and more research scholars began to pay attention to the safety of PDF documents.At this stage of the malicious PDF document detection methods there are some shortcomings.How to improve the detection accuracy of current malicious PDF documents and adapt to the emerging malicious PDF documents is the focus of the current malicious PDF document detection technology research.This paper studies the research background,significance and development status of the current malicious PDF document detection,and expounds the fact that most malicious PDF documents are based on the JavaScript code,analyzes the attack mode and the transmission way of the existing malicious PDF document.Finally,Proposed and implemented a malicious PDF document detection system.In the study of feature generation and feature extraction of malicious PDF documents,firstly proposes a scheme to extract the embedded JavaScript code in PDF documents,and take the corresponding code anti-obfuscation for several common code obfuscation methods at this stage Processing,can effectively restore the original code and improve the accuracy of the detection of malicious PDF documents.Secondly,according to the features of malicious PDF document,the text is generated based on TF-IDF algorithm,and the generated features are analyzed.At the same time,the feature extraction based on PCA algorithm is adopted to obtain the ideal malicious PDF document multidimensional eigenvector.In the study of malicious PDF document detection model,firstly,an improved OCSVM algorithm classifier is proposed,which improves the detection accuracy of malicious PDF documents by setting up sub-models for specific malicious PDF document features.Secondly,for the traditional detection model can not effectively use a large number of unknown PDF documents for learning and training issues,Based on the Tri-training semi-supervised learning algorithm,this paper establishes a static detection model to improve the detection capability and generalization ability of the detection system.Finally,this paper proposes a dynamic detection method based on libemu as a supplement to the static detection model for the problem that the static detection model can not detect the 0day malicious PDF document.The experimental results shown that compared with the traditional malicious PDF document detection technology,the detection system proposed in this paper has a more accurate detection rate for malicious PDF document test results,and verifies the feasibility of this research program.

Keywords/Search Tags:

Malicious PDF document, Anti-obfuscation, Feature extraction, Document detection, OCSVM

PDF Full Text Request

Related items

1	Research On Malicious PDF Document Static Detection Technology Based On Improved N-gram
2	Research And Implementation Of Malicious PDF Document Detection Technology
3	Research Of Massive Chinese Document De-duplication Based On Topic
4	Research On Malicious Webpage And PDF Document Detection Based On SVM Model
5	The PDF Document Generation And Its Content Extraction In ScienceWord
6	Design And Implementation On Document Image Recognition System
7	Research On A Novel Adaptive Anti-obfuscation Model For Detecting Malicious Code
8	Extraction Of Mathematics Formulas In Chinese Scientific Document
9	Research On Intrusion Detection Algorithm Of Industrial Control Systems Based On OCSVM
10	Research And Implementation On Machine Learning-Based Detection Of Malicious Script Codes