The Research On Web Page Malicious Code Detection Based On Classifier Ensemble

Posted on:2018-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Zhu

Full Text:PDF

GTID:2348330518975632

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In this era of rapid Internet sites,the relationship between people and the network become more and more inseparable.Urban services for people make a great contribution,greatly improve people's lives.However,the network which brings convenience to people's lives also brings hidden dangers at the same time.Illegal elements in the rapid development of the network to see an opportunity.Many lawless elements use malicious code to destroy network security,seek economic benefits,the rapid development of the network also provide them with a good hotbed.The government and the state are also getting more and more attention to malicious code detection.Malicious code detection is generally divided into static detection and dynamic detection of two methods.Static detection extracts web features base on rules and eigenvalue matching.Dynamic detection is based on the malicious code to run malicious code,according to the behavior of malicious code to extract features.This article is mainly for JavaScript malicious code,set and machine learning to detect malicious code.The main work of this paper is as follows:1.This article proposes to confuse the JavaScript code with the V8 engine into machine code and simplify the operand classification in the machine code for the characteristics of malicious code.Mix the simplified operands with the opcode and extract the features with Bi-Gram and Tri-Gram.The paper proposes to find out the breakpoint based on frequency,entropy,distance and mutual information for the processed sample,and statistics the variable length N-gram of single sample.Experiments show that the feature extraction of the processed operand and the opcode can express the machine code behavior more finely,and avoid the problem of separating the valid sequence by using the variable length N-Gram to improve the classification effect.2.Based on the study of common classification algorithms and classifier ensemble methods,an ensemble classifier input optimization is proposed for the problem of single input.The input data sets are processed in different ways,so that the internal classifier can be targeted Training to form a classification model for ensemble.And by adding a subclassifier,the original monolithic classifier integration structure is transformed into a multi-level classifier ensemble.Also,by introducing weights,set different weights for each classifier,and find out the best weight assignments by training.Experiments show that a variety of optimized multi-level weighted classifier ensemble has a better classification effect.3.On the basis of the above algorithm,the online malicious code detection system is designed and developed.Users can submit malicious script code or website address online,the system can quickly detect.The user can submit a test report and view the test report submitted by others.If the code is detected as malicious by system,the system will automatically save to the database.

Keywords/Search Tags:

machine learning, variable length N-Gram, classifier ensemble, machine code

PDF Full Text Request

Related items

1	The Study Of Malicious Code Detection Based On Data Mining And Machine Learning
2	Research On CCSDS Protocol Identification Technology In Spatial Link Layer Based On Ensemble Learning
3	The Research On Sar Image Target Recognition Technology Based On Feature Fusion And Extreme Learning Machine
4	The Development Of Multi-Channel Gamma Spectrum Measure Software Based On Ensemble Classifier
5	Research On Computational Classifier Ensemble Model And Application For Pedestrian Detection
6	Case Studies For Semantic Aware Statistical Machine Learning Applications In Code Security Problems
7	Machine Learning And Optimization Design Of Neural Network Classifier
8	Research On Twitter Emotion Classification Based On Machine Learning
9	Studies On Classifiers Based On Decision Boundaries From The Perspective Of Dividing Data Space
10	Feature Selection Based Ensemble Classification And Its Application