Malicious Code Detection Technology Based On Machine Learning Algorithm

Posted on:2019-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:P B Zhu

Full Text:PDF

GTID:2348330545455618

Subject:Computer Science and Technology

Abstract/Summary:

With the development and use of information sharing technology,the development speed and update frequency of malicious code have become faster and faster,and various unknown malicious code emerge in an endless stream,which brings great challenges to analysts.Traditional feature matching-based detection methods are incapable of acting on various variants of malicious code.In recent years,machine learning and data mining technologies have developed rapidly.There have been many studies that have applied these techniques to the actual detection environment of malicious code and have achieved good results.How to cope with the explosive growth of malicious code,especially the detection of unknown malicious code and its variants has become a research focus of malicious code detection technology.This paper first studies the traditional malicious code detection technology based on machine learning algorithms,and analyzes the shortcomings of traditional methods.Most of the traditional methods extract assembler opcodes as feature representations.Due to the features of assembler opcodes,a small number of assembler opcodes are difficult to represent the behavioral characteristics of programs.Therefore,in recent years some scholars have attempted to abstract assembler opcodes to expect them.Get more meaningful behavioral characteristics.Second,this paper presents an improved method for malicious code detection technology that incorporates abstraction operations.There are many kinds of abstraction methods.How to make the detection system always choose the most suitable abstraction method is the problem that this article is going to solve.Firstly,the middle code feature sequence is obtained by using multiple abstractions.Then the Eclat algorithm is used to analyze the frequent item sets of intermediate code sequence features.Then the average similarity defined in this paper is used as a measure to select the most appropriate midamble feature sequence.Finally,a probability matrix is constructed based on this midamble feature sequence to complete the feature representation of the malicious code.Then the method of this article is simulated,and the experimental results are evaluated by three indexes:accuracy rate,precision rate and recall rate.Experimental results show that compared with the features extracted from other similar methods,the features extracted by this method have more obvious effects on malicious code classification.Finally,an online detection system for malicious code was designed and implemented.

Keywords/Search Tags:

machine learning, malicious code detection, frequent itemsets, probability matrix

Related items

1	Research On Algorithm For Mining Frequent Itemsets Of Uncertain Data
2	Research On Top-K Frequent Itemsets Datamining Algorithm
3	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
4	Design And Implementation Of JavaScript Malicious Code Detection Model Based On Machine Learning
5	Design And Implementation Of JavaScript Malicious Code Detection Technology Based On Machine Learning
6	FP-Tree Based Mining Frequent Itemsets Over Data Streams
7	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets
8	Research On Multi-objective Restricted Boltzmann Machine Model For Malicious Code Detection
9	Research On Malicious Code Detection Method Based On Incremental Learning
10	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data