The Study Of Malicious Code Detection Based On Data Mining And Machine Learning

Posted on:2010-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:X K Zhang

Full Text:PDF

GTID:2178360302459611

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

The detection of malicious code has become a difficult work because of the growing number and type of malicious code, and the popular of code obfuscation technology. Traditional signature-based detection is applied to all anti-virus software. However, it needs to obtain the signature of a malicious code before this kind be detected effectively, and the signature is usually acquired after infecting computer. This disadvantage makes computer system attacked by malicious code possible. Recently, data mining and machine learining technology is applied in the field of malicious code detection. It has been become the point study, because it can make use of data mining technology, dig meaningful patterns from a large number of code data, and make use of machine learning technology to help summarize the identification knowledge of known malicious code. Similarity research is carried out to help detect unknown malicious code. This paper uses data mining and machine learning technology to detect malicious code. After introducing the relevant background and theory of malicious code, data mining and machine learning, we discussed feature extraction and selection method. The contribution of this paper is listed followed:1. A malicious code detection system is implemented which adopts binary sequence variable length N-gram as the feature extraction method, weighted information gain as the feature selection method. Several classifiers like Decision Tree, SVM, Na?ve Bayes, are used to detect malicious code in the system.2. This paper uses a malicious feature extraction method called variable length N-gram, makes up for the deficiency of N-gram that can not extract features with different length. The results of compare our experiments with Kolter's which uses fixed length N-gram method prove that variable length N-gram is better than N-gram.3. This paper proposed a feature selection method based on weighted information gain. This method can select effective features more correctly by combining the advantage of information gain with classwise frequency. It makes up for the deficiency of IG(Information Gain) that takes into account the existence of a feature but ignores their frequencies. The experiments results prove that this method can effectively improve the detection and accuracy rate.Through above mentioned research and experiments, the high efficiency and accuracy are proved.

Keywords/Search Tags:

Malicious Detection, Data Mining and Machine Learning, Variable Length N-gram, Weighted Information Gain

PDF Full Text Request

Related items

1	The Research On Web Page Malicious Code Detection Based On Classifier Ensemble
2	Research And Implementation On Machine Learning-Based Detection Of Malicious Script Codes
3	A Research On Engine Of Behavior-Based Detection Of Malicious Code Technology
4	Research And Development Of Malicious Code Detection System Based On N-GRAM
5	The Research And Implementation Of Android Malware Detection System Based Image Mode
6	Malicious Web Sites Detection Based On Data Mining Algorithms
7	Research On Malicious URL Detection Technology Based On Machine Learning
8	A Research Of Genetic K-Means Algorithm Based On Variable Length Encoding
9	Research And Implementation Of Malicious Behavior Detection In Android Applications
10	Research On Malicious URL Detection Based On Machine Learning