Font Size: a A A

Research On Analysis Of Malware Based On Machine Learning And Intelligent Detection Technology

Posted on:2015-08-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:1108330464971599Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the Internet, personal computers, and mobile computing platforms, malicious software are emerging and growing rapidly as well and those malicious ware threat computer users’ information security seriously.The author studied the difficult issues in malware behavior analysis.In behavior analysis, it’s proposed to use time-series data to determine the malware variants. In order to solve the problem of the existed solution, the author designed SimHash-LCS algorithm. In order to preserve detailed information of malicious behaviors yet without reducing efficiency, SimHash algorithm concept is introduced to convert numerical value and the corresponding fuzzy equivalent algorithms is designed. In Series algorithm, the longest common sub-sequence is introduced which suits similarity evaluation between two sequences of greatly different length, and the algorithm can filter out the noise data. Experimental results show that the algorithm eventually is much much more effective than the mainstream algorithms, including the dynamic time deformation algorithm and the minimum edit distance algorithm,. The new algorithm can effectively judge malware variants. The algorithm can also be applied in other fields which need time series analysis, especially in the case of big different length of matching sequence, and with high performance requirements of noise filtration due to its particular advantages in these fields.BP neural network is introduced into the field of malware behavior classification to design appropriate data conversion algorithm by using experiments to find the best combination of the various operators and parameters in neural networks。 Ultimately a suitable BP neural network was designed. Experiments show the network has a high classification accuracy than KNN NB algorithm does and it already has practical value to some extent.The paper also attempts to introduce SVM into the field of malware behavior classification. Firstly, it used 10-fold cross-validation method to determine the selection SVM algorithm; then it designed experiments to find each kernel function in SVM and the optimal parameters (C, g). In order to reduce the workload of the experiment, the author made theoretical analysis in the area in which optimal combination may appear, and then grid method and Genetic Algorithm was used to do initial search in this area, and then did refined search with genetic algorithms. Finally it proved the best parameter pair of SVM based on RBF kernel function. Experimental results show that SVM has an close accuracy of classification as the preceding BP neural network does.Finally, it shows that under the existing technical conditions, either the building of behavior library or behavior-capture can not guarantee the accuracy and adequacy of data. To solve the problem of partly missing data, the paper attempts to introduce the concept of gray systems. The gray system and Extreme Learning Machine are integrated to design gray Extreme Learning Machine models. Experiments were carried out to test the model’s anti-interference ability and other indicators. The experimental results show better adaptability of Extreme Learning Machine model than of ELM in malicious behavior analysis.In the construction of the malicious behavior library, the author gave a formal definition of malware and malicious behavior; Existing security tools are used to set up an integrated platform to track and analyze malware samples; self-designed XML tags are used to describe malicious behavior specifically. A relatively perfect malicious behavior signature library was established by above means.In surveillance application layer, the author also proposed a new method which mixed module injection and no modules injection. Ordinary module injection is given in order to be neglected by malicious ware; Then the module gets eliminated by itself, so that malicious software can not detect the presence of monitoring software. Solutions are listed out for some typical specific technical problems in application. This method proved to have good concealment and universality through tests.In kernel surveillance, a new technology called Secret Inline Hook is proposed, and this technique is optimized based on the SSDT Inline Hook. Its basic idea is to use the next layer functions in Hook SSDT table. The anti-monitoring by malware is almost an impossible task as it needs to traverse all functions in the lower layer as there are a large number of underlying functions, so the method is a way of well concealment. The author gave an example to demonstrate the application of this method, and proved its security and effectiveness through experiments.
Keywords/Search Tags:Malware behavior analysis, Secret inline hook, Simhash-LCS, Machine learning, Gray extreme learning machine
PDF Full Text Request
Related items