Font Size: a A A

Research On Malicious PDF Document Static Detection Technology Based On Improved N-gram

Posted on:2018-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:J P XuFull Text:PDF
GTID:2348330536968320Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology and the popularity of office automation,PDF documents gradually become essential to the application of text software for people to work and learn.Although the PDF document brings a lot of convenience,the use of the process gradually a lot of security problems.An attacker exploits a PDF file format vulnerability to embed malicious JavaScript code to attack,obtain private information for a particular target,and cause an immeasurable loss to a particular target.So the detection and prevention of embedded malicious JavaScript code PDF documents gradually become the field of information security research scholars at home and abroad an important goal.This paper analyzes the PDF document,mainly introduces the physical structure and logical structure of the PDF document,the attack technology of the PDF document and the way of the transmission of the malicious PDF document.Depth analysis of existing N-gram based malicious PDF document static detection model,there are two shortcomings: First,ignoring the PDF document hidden information on the extraction of JavaScript code integrity and the extraction of the JavaScript code preprocessing;Second,the N-gram feature extraction method can only extract fixed-length N-gram features,resulting in effective features are separated.In this paper,an improved N-gram malicious PDF document static detection model is proposed to design a PDF document preprocessing process,including decryption processing,decoding processing,JavaScript positioning and extraction and JavaScript to obfuscate,to ensure that the extracted JavaScript code Complete and effective;based on the existing N-gram feature extraction method to improve,to ensure that the extraction of more effective N-gram eigenvector.In order to verify the validity of the improved N-gram feature extraction method,the feature extraction is carried out by using the improved N-gram feature extraction method.The extracted feature vector is used as the data input part,and the training and testing are carried out by using a variety of detection algorithms.As a result,the detection algorithm combined with Boosting algorithm for training and testing to get the test results.Through the test results,it is verified that the improved N-gram feature extraction method proposed in this paper is effective for the detection of malicious PDF documents,and it can improve the N-gram feature extraction method and improve the detection effect,and the Boosting algorithm can be improved The detection performance of the model is better than that of the DPScan model and the PJScan model.
Keywords/Search Tags:PDF document, JavaScript code, N-gram feature extraction, Boosting algorithm
PDF Full Text Request
Related items