Font Size: a A A

Study On The Intelligent Methods For Detection Of Computer Viruses

Posted on:2008-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Y ZhangFull Text:PDF
GTID:1118360242999352Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computer viruses have been one of the most serious threats to information security due to the significant damage and the fast spread of them. As virus become more complex and sophisticated, the classical scanning detection method is no longer able to detect various forms of virus code effectively. It is crucial to develop new methods for defending viruses. In this dissertation, we explore the intelligent methods of automatically detecting viruses based on statistical learning theory. The main contributions of the dissertation are summarized as following:Firstly, a multi-na(?)ve Bayes algorithm to detect computer viruses automatically is presented. This model monitors programs in the virtual machine to learn their behavior. As program interacts with operating system at runtime, the most relevant API calls are extracted as feature vector in detection engine. After being trained, the multi-na(?)ve Bayes classifier could be used to check malicious file. It is an efficient method for detecting the polymorphic viruses.Secondly, using the method based on fuzzy pattern recognition algorithm, an intelligent system to detect the computer viruses is proposed. In this method the program files could be expressed as fuzzy sets. Then the principle of fuzzy closeness optimization to classification of samples is applied. Experimental results show that the method could detect known and unknown viruses by analyzing their behavior. The accuracy of the detection method is 91.93%Thirdly, a method based on support vector machine (SVM) is proposed for detecting the computer virus. By utilizing SVM, the generalizing ability of virus detection system is still good even the sample dataset size is small. An experiment using system API function call trace is given to illustrate the performance of this model. It is found that the detection system based on SVM needs less priori knowledge than other methods and can shorten the training time under the same detection performance condition. The encrypted virus, the obfuscated virus and the dynamic load library virus can be detected by analyzing the behavior information of the programs.Fourthly, motivated by the standard signature-based technique for detecting viruses, we explore the idea of automatically detecting viruses by use of the n-gram analysis. The original sample data is preprocessed with the knowledge reduction algorithm of rough set theory, and the redundant features are eliminated from the working sample dataset to reduce space dimension of sample data. The detection system categorizes a program as either normal or abnormal by the statistical method. It has no use for extracting the characteristic code of viruses before detection. An efficient implementation to calculate relative core, based on positive region definition is presented. Fifthly, we generalize the problem of neural network ensemble by use of the modified bagging method to detect previously unknown viruses. After selecting features based on information gain, the probabilistic neural network is used in the process of building and testing the proposed ensemble system. Experimental results produced by the proposed detection engine show the improvement of the generalization compared to the classical bagging method. And the approach yields great efficiency compared to the attribute bagging method.Last, we present a virus detection system based on the D-S theory of evidence, in which the dynamic and static analysis methods are combined. The detection engine applies two types of classifier, support vector machine and probabilistic neural network to detect the virus. For SVM classifier, we extract the feature vector by monitoring the samples. And the static feature of samples is used in the probabilistic neural network classifier. Finally, the D-S theory of evidence is used to combine the contribution of each individual classifier to give the final decision.The approach of the belief estimation is the key of D-S theory. We propose a new method based on statistical measure of the individual classifier. In a general way, the main aim of a classifier is to enlarge the inter class distances, however no matter what the theory behind it is. That is say the more a classifier is able to discriminate between the classes, the better the classification results is. Based on this observation, we use inter class distances as an evidence of our belief.As we know the complexity of Dempster's combination rule is P-complete. But in the domain of virus detection, we prove that its time complexity is O(N) in the restricted situation. This shows the presented method is efficient for the detection of viruses. Comparison experiments on polymorphic viruses show that the performance of our method is better than that of the commercial-grade antivirus tools.
Keywords/Search Tags:Information Security, Computer Viruses, Statistical Learning Theory, Fuzzy Pattern Recognition, Rough Set Theory, Dempster-Shafer Theory of Evidence
PDF Full Text Request
Related items