Research On Malicious PDF Document Static Detection Technology Based On Improved N-gram

Posted on:2018-12-28

Degree:Master

Type:Thesis

Country:China

Candidate:J P Xu

Full Text:PDF

GTID:2348330536968320

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology and the popularity of office automation,PDF documents gradually become essential to the application of text software for people to work and learn.Although the PDF document brings a lot of convenience,the use of the process gradually a lot of security problems.An attacker exploits a PDF file format vulnerability to embed malicious JavaScript code to attack,obtain private information for a particular target,and cause an immeasurable loss to a particular target.So the detection and prevention of embedded malicious JavaScript code PDF documents gradually become the field of information security research scholars at home and abroad an important goal.This paper analyzes the PDF document,mainly introduces the physical structure and logical structure of the PDF document,the attack technology of the PDF document and the way of the transmission of the malicious PDF document.Depth analysis of existing N-gram based malicious PDF document static detection model,there are two shortcomings: First,ignoring the PDF document hidden information on the extraction of JavaScript code integrity and the extraction of the JavaScript code preprocessing;Second,the N-gram feature extraction method can only extract fixed-length N-gram features,resulting in effective features are separated.In this paper,an improved N-gram malicious PDF document static detection model is proposed to design a PDF document preprocessing process,including decryption processing,decoding processing,JavaScript positioning and extraction and JavaScript to obfuscate,to ensure that the extracted JavaScript code Complete and effective;based on the existing N-gram feature extraction method to improve,to ensure that the extraction of more effective N-gram eigenvector.In order to verify the validity of the improved N-gram feature extraction method,the feature extraction is carried out by using the improved N-gram feature extraction method.The extracted feature vector is used as the data input part,and the training and testing are carried out by using a variety of detection algorithms.As a result,the detection algorithm combined with Boosting algorithm for training and testing to get the test results.Through the test results,it is verified that the improved N-gram feature extraction method proposed in this paper is effective for the detection of malicious PDF documents,and it can improve the N-gram feature extraction method and improve the detection effect,and the Boosting algorithm can be improved The detection performance of the model is better than that of the DPScan model and the PJScan model.

Keywords/Search Tags:

PDF document, JavaScript code, N-gram feature extraction, Boosting algorithm

PDF Full Text Request

Related items

1	Clustering Analysis Of Malicious Code Based On N-gram Feature Extraction
2	Bayesian Classifier And Web Document Classification
3	Research And Development Of Malicious Code Detection System Based On N-GRAM
4	Research Of Souce Code Plagiarism Detection Method Based On N-gram
5	JavaScript Obfuscation Detection Methods Based On CNN
6	Extraction And Retrieve Of The Feature Of Document Image
7	Research On Key Technologies Of Malware Feature Extraction Based On System Call Analysis
8	Multi-level Feature Selection And Feature Fusion In The Application Of Visual Tracking
9	Research And Implementation Of JavaScript Code Simplification Based On Static Analysis And Dynamic Execution
10	Research On JavaScript Malicious Code Detection Model Based On Anti-obfuscated Technology