Font Size: a A A

The Research On Web Page Malicious Code Detection Based On Classifier Ensemble

Posted on:2018-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhuFull Text:PDF
GTID:2348330518975632Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In this era of rapid Internet sites,the relationship between people and the network become more and more inseparable.Urban services for people make a great contribution,greatly improve people's lives.However,the network which brings convenience to people's lives also brings hidden dangers at the same time.Illegal elements in the rapid development of the network to see an opportunity.Many lawless elements use malicious code to destroy network security,seek economic benefits,the rapid development of the network also provide them with a good hotbed.The government and the state are also getting more and more attention to malicious code detection.Malicious code detection is generally divided into static detection and dynamic detection of two methods.Static detection extracts web features base on rules and eigenvalue matching.Dynamic detection is based on the malicious code to run malicious code,according to the behavior of malicious code to extract features.This article is mainly for JavaScript malicious code,set and machine learning to detect malicious code.The main work of this paper is as follows:1.This article proposes to confuse the JavaScript code with the V8 engine into machine code and simplify the operand classification in the machine code for the characteristics of malicious code.Mix the simplified operands with the opcode and extract the features with Bi-Gram and Tri-Gram.The paper proposes to find out the breakpoint based on frequency,entropy,distance and mutual information for the processed sample,and statistics the variable length N-gram of single sample.Experiments show that the feature extraction of the processed operand and the opcode can express the machine code behavior more finely,and avoid the problem of separating the valid sequence by using the variable length N-Gram to improve the classification effect.2.Based on the study of common classification algorithms and classifier ensemble methods,an ensemble classifier input optimization is proposed for the problem of single input.The input data sets are processed in different ways,so that the internal classifier can be targeted Training to form a classification model for ensemble.And by adding a subclassifier,the original monolithic classifier integration structure is transformed into a multi-level classifier ensemble.Also,by introducing weights,set different weights for each classifier,and find out the best weight assignments by training.Experiments show that a variety of optimized multi-level weighted classifier ensemble has a better classification effect.3.On the basis of the above algorithm,the online malicious code detection system is designed and developed.Users can submit malicious script code or website address online,the system can quickly detect.The user can submit a test report and view the test report submitted by others.If the code is detected as malicious by system,the system will automatically save to the database.
Keywords/Search Tags:machine learning, variable length N-Gram, classifier ensemble, machine code
PDF Full Text Request
Related items