Font Size: a A A

Data Mining-based Virus Detection System

Posted on:2007-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YeFull Text:PDF
GTID:2178360182973209Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The proliferation of malware in recent years has presented a serious threat to the security of computer systems. Polymorphic computer viruses, which adopt obfuscation technique, are more complex and difficult than their original versions to detect, as well as new, previously unseen viruses, often making antivirus companies ineffective when using the classic signature-based virus detection technique. In this paper, we rest on the analysis of Win API calling sequences of PE files and propose a new approach for detecting polymorphic or even unknown malware in the Windows platform based on data mining technique, namely OOA Mining algorithm. Our approach rests on an analysis based on the Win API calling sequence that reflects the behavior of a piece of particular code. The analysis is carried out directly on the PE code. It is achieved in three major steps: construct the API calling sequences for both training set and testing set; and then extract the OOA rules from the training set using OOA Mining algorithm; at last, detect the testing set according to the OOA rules created by the OOA rules generator. We implement a malware detection system, DMAV system, to evaluate the effectiveness of our proposed approach, mainly three major modules included: 1. PE Parser: considering that virus scanner is a speed sensitive application, and, in order to improve the system performance, we develop a PE parser to construct the API calling sequences of a PE file instead of using a third party disassembler. In advance, in order to make conveniency for our DMAV system's further analysis, we also implement three other functions: function calls' extraction from a PE file's export table, the extraction of a PE file's section information and the disassembler of a PE file. 2. OOA Rules Generator: after extracting the Win API sequences of the training set by our PE parse, we store these Win API sequences into the database as signatures, then pass them through the OOA Rules generator to mine the rules satisfying the specific objective, and thus store such rules into the OOA Rules database. In addition, we implement three distinguishing algorithms for our OOA mining, and they are as follows: OOA_Apriori, OOA_FPgrowth, OOA_DMAV_FPgrowth. 3. Malware Dection Module: in order to determine whether a PE file in the testing set is a malware or not, we pass this PE file's API sequence constructed by our PE parser, together with each OOA Rule in OOA Rules database created by our OOA rule generator, through a malware detection module. In this paper, we make contribution to optimize the signature creation using OOA mining algorithm, and, the experiment results illustrate that, compared to the other two OOA mining algorithms, our OOA_DMAV_FPgrowth algorithm performs the highest efficiency. Thus, encouraging experimental results demonstrate the robustness and intelligence of our DMAV system: compared with several popular anti-virus softwares, our DMAV system can detect not only known viruses, but also polymophic and new previously unseen malware effectively and efficiently.
Keywords/Search Tags:Malware/Virus, PE file, Win API sequence, Data mining, OOA mining
PDF Full Text Request
Related items