Font Size: a A A

Malicious Code Detection Technology Based On Machine Learning Algorithm

Posted on:2019-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:P B ZhuFull Text:PDF
GTID:2348330545455618Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development and use of information sharing technology,the development speed and update frequency of malicious code have become faster and faster,and various unknown malicious code emerge in an endless stream,which brings great challenges to analysts.Traditional feature matching-based detection methods are incapable of acting on various variants of malicious code.In recent years,machine learning and data mining technologies have developed rapidly.There have been many studies that have applied these techniques to the actual detection environment of malicious code and have achieved good results.How to cope with the explosive growth of malicious code,especially the detection of unknown malicious code and its variants has become a research focus of malicious code detection technology.This paper first studies the traditional malicious code detection technology based on machine learning algorithms,and analyzes the shortcomings of traditional methods.Most of the traditional methods extract assembler opcodes as feature representations.Due to the features of assembler opcodes,a small number of assembler opcodes are difficult to represent the behavioral characteristics of programs.Therefore,in recent years some scholars have attempted to abstract assembler opcodes to expect them.Get more meaningful behavioral characteristics.Second,this paper presents an improved method for malicious code detection technology that incorporates abstraction operations.There are many kinds of abstraction methods.How to make the detection system always choose the most suitable abstraction method is the problem that this article is going to solve.Firstly,the middle code feature sequence is obtained by using multiple abstractions.Then the Eclat algorithm is used to analyze the frequent item sets of intermediate code sequence features.Then the average similarity defined in this paper is used as a measure to select the most appropriate midamble feature sequence.Finally,a probability matrix is constructed based on this midamble feature sequence to complete the feature representation of the malicious code.Then the method of this article is simulated,and the experimental results are evaluated by three indexes:accuracy rate,precision rate and recall rate.Experimental results show that compared with the features extracted from other similar methods,the features extracted by this method have more obvious effects on malicious code classification.Finally,an online detection system for malicious code was designed and implemented.
Keywords/Search Tags:machine learning, malicious code detection, frequent itemsets, probability matrix
PDF Full Text Request
Related items