Font Size: a A A

Research On Malicious Code Detection Technology

Posted on:2012-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2248330395464044Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology, computers have become an important part of daily life. Entertainment, business office and even the national production are closely related with computer technology. However, computer security threats are everywhere with the wide application of computer technology, which causes great distress and danger on our daily use of computer. Although the wide use of anti-virus software has provided us with a certain degree security, traditional detection techniques has a time lag because of endless appearing and rapidly spreading of unknown malicious code, so that there are long-term hidden dangers for the computer security. This paper mainly researches the malicious code detection based machine learning, and focuses on the application of using one-class classification methods aimming at the weak performance of the traditional detection technology for detecting unknown malicious code. The details are listed as follows:(1) The Feature Representation Method based on TF-IDF and LSIIn this paper, frequency distribution of feature items appearing in the samples constitute the knowledge structure of the machine learning process. Concentrated redundant information and noise of original data can affect the learning process. In order to avoid the effect, we use the term frequency-inverse document frequency (TF-IDF) method to quantize the original data and form the matrix of weighted features. Finally, LSI method is used for feature reconstruction to enhance the ability of feature representation and reduce the computing consumption.(2) Applying One-class Classification Method to Malicious Code DetectionCompared to benign code samples, malicious code samples are often difficult to get. Imbalanced problems often appear when using two-class classification for detecting malicious code, and always cause weak ability for detecting unknown type because of limited knowledge of samples. Using the benign code samples as positive class for training, one-class classification can distinguish outlier samples (including malicious codes of known and unknown types) with benign samples. Experiments show that applying one-class classification to malicious codes detection can get good detecting performance.(3) Proposing a method of One Class Transductive Support Vector MachineTo use large of unlabeled sample information effectively, this paper applies transductive learning mechanism to detecting malicious codes and improving performance of classifier. For the problem of imbalanced data caused by less outlier samples in the malicious code detection, one-class transductive support vector machine(OCTSVM) is proposed based transforming the adaptability of two-class transductive support vector machine. OCTSVM uses unlabeled samples to adjust the training set adaptively, making the data distribution of training set is more accurate and improving detecting ability.(4) Study of Imbalance ProblemBecause of the scarcity of malicious code samples and the existence of a large number of unlabeled samples, imbalance problem exist in the malicious code detection widely. It easily makes the classifier be in favor of a class with more samples, and affect the classification results and performance evaluation. Through adjusting samples adaptively, OCSVM improves the ability to adapt to the imbalance problem. However, using AUC as indicator which is immune to imbalance problem to measure the performance of classification method can obtain more reliable evaluation result than accuracy method.
Keywords/Search Tags:Malicious Code Detection, One-class Classification, Feature Representation, One-class Transductive Vector Machine, AUC
PDF Full Text Request
Related items