Font Size: a A A

Research Of Malicious PDF Document Detection Technology

Posted on:2018-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:D FengFull Text:PDF
GTID:2348330542490834Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,electronic document transmission for the purpose of information exchange and data distribution is becoming more and more frequent.Portable Document Format(PDF)has become the de facto standard of global electronic document transmission format.But at the same time as PDF documents easy to spread,easy to expand and other features,PDF documents has become a network attacker to implement an important carrier of malicious attacks.At present,more and more research scholars began to pay attention to the safety of PDF documents.At this stage of the malicious PDF document detection methods there are some shortcomings.How to improve the detection accuracy of current malicious PDF documents and adapt to the emerging malicious PDF documents is the focus of the current malicious PDF document detection technology research.This paper studies the research background,significance and development status of the current malicious PDF document detection,and expounds the fact that most malicious PDF documents are based on the JavaScript code,analyzes the attack mode and the transmission way of the existing malicious PDF document.Finally,Proposed and implemented a malicious PDF document detection system.In the study of feature generation and feature extraction of malicious PDF documents,firstly proposes a scheme to extract the embedded JavaScript code in PDF documents,and take the corresponding code anti-obfuscation for several common code obfuscation methods at this stage Processing,can effectively restore the original code and improve the accuracy of the detection of malicious PDF documents.Secondly,according to the features of malicious PDF document,the text is generated based on TF-IDF algorithm,and the generated features are analyzed.At the same time,the feature extraction based on PCA algorithm is adopted to obtain the ideal malicious PDF document multidimensional eigenvector.In the study of malicious PDF document detection model,firstly,an improved OCSVM algorithm classifier is proposed,which improves the detection accuracy of malicious PDF documents by setting up sub-models for specific malicious PDF document features.Secondly,for the traditional detection model can not effectively use a large number of unknown PDF documents for learning and training issues,Based on the Tri-training semi-supervised learning algorithm,this paper establishes a static detection model to improve the detection capability and generalization ability of the detection system.Finally,this paper proposes a dynamic detection method based on libemu as a supplement to the static detection model for the problem that the static detection model can not detect the 0day malicious PDF document.The experimental results shown that compared with the traditional malicious PDF document detection technology,the detection system proposed in this paper has a more accurate detection rate for malicious PDF document test results,and verifies the feasibility of this research program.
Keywords/Search Tags:Malicious PDF document, Anti-obfuscation, Feature extraction, Document detection, OCSVM
PDF Full Text Request
Related items