Font Size: a A A

Research On Malware Classification Based On Multi Classifier Integration

Posted on:2020-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:W DingFull Text:PDF
GTID:2518306047498424Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,malware is spreading more and more,and malware has brought great risks to Internet security.At the same time,with the maturity of deformation,polymorphism,confusion and other technologies,and the protection mechanism of malware is gradually strengthened,which makes the traditional signature-based scanning method gradually invalid.Over the years,relying on the self-learning ability of machine learning,researchers have proposed a large number of malware recognition methods.Machine learning can summarize malware recognition experience from a large number of known samples,and generate models with recognition functions,not only for known It not only has a strong recognition ability for known malware,but also has a high accuracy for predicting unknown malware.In order to solve the problem of low accuracy of malware classification,this paper proposes two kinds of malware classification methods,the cyclic neural network classification method based on API call sequence and the classification method based on multi-classifier integration.The main work is as follows:1)Select the characteristics of the sample based on the information gain.The features used in this paper include the sequence features of operation codes based on N-gram,PE structure information features and API call sequence features generated based on disassembly files.In order to improve the recognition speed and high accuracy of malware,this paper uses the information gain algorithm to select the features of the extracted features and filter out the features with low importance in the feature set.The features for feature selection include four types of features: an operation code sequence,an import table information,a section name,and an API call function.2)A classification method of cyclic neural networks based on API call sequences is proposed.Firstly,the disassembly file is analyzed,and the API functions invoked are extracted and sorted in top-down order.Using the advantage of the cyclic neural network to deal with the serialization problem,this paper uses the two-way neural network as the classifier to train the feature sequences,and the accuracy of different family detection by the classifier is studied and analyzed,and compared with random forest based on opcode sequence and SVM classifier based on PE structure.3)A method of malware classification based on multi-classifier integration is proposed.This paper not only implements a cyclic neural network based on API call sequence,but also implements an opcode-based random forest and a PE-based SVM classifier.The experimental analysis of the samples obtained in this paper proves that these methods are feasible.Through the soft voting method,three classifiers are integrated and the malware classification method of multi-classifier integration is implemented,and the experiment comparisons is made with multifeature fusion method.In the experimental stage,this paper integrates three kinds of classifiers and multi-features malware classification methods for experimental comparison.The results show that the final accuracy is 93.2%,which is 1.8 percentage points higher than the multi-feature fusion method.The method proposed in this paper has higher accuracy.
Keywords/Search Tags:Malicious software, Voting, RNN, Fusion Model, Static Analysis
PDF Full Text Request
Related items