Font Size: a A A

Research And Improvement On Classification Of Homologous Binary Malware

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2428330647950741Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Malware is malicous software which perform unauthorized malicious actions on user devices [1],which is an extremely common security threat in information age.Therefore,there has been a series of studies aimed at solving the problems of malware feature extraction and classification.Code obfuscation technology can greatly change the grammatical characteristics of malware [2],thus obtaining a series of homologous malware.Affected by this,the number of malware variants has also increased exponentially [3],which makes the problem of classification of homologous malware more complicated and difficult.Therefore,the main research content of our work is how to carry out the classification of homologous malicious binary software more effectively under the interference of the limited range of code obfuscation technology.We researches and analyzes the key points and difficulties of this problem.In view of the shortcomings of related work in the past,we give the solution of classification of homologous malware.Our main work is as follows:1 We summarizes the previous work on malware classification.According to whether the target software needs to be executed,these related work can be divided into static analysis methods and dynamic analysis methods.Due to the interference of runtime code packaging technology [13],it is very challenging for the static analysis method to obtain the identification feature of homologous malware,so we mainly focuses on the related work of dynamic analysis methods.The core of the dynamic analysi s method lies in the definition of identifying features.This thesis investigates the existing dynamic analysis methods,explains the evolution process of related work for identifying feature definitions,and explains the deficiencies of related work: The feature definitions of related work[28-31][16] are all based on system call dependency graph,essentially focusing only on the operation process with software behavior,which is susceptible to interference from some system call obfuscation techniques[33].The feature definition of Feature Set[8] is essentially focused on the operation result of the software operation on the specified operation object,which can solve some system call obfuscation problems that is difficult for the system call dependency graphs based work,but it's classification is not accurate enough for some non-homologous malwares.Therefore,the goal of our work is to improve the definition of identifying features of related work so that it can solve the problem of inaccurate classificati on of related work while still having the ability to resist the interference of code obfuscation technology with limited scope.2 To solve the problem that classification of related work are not accurate enough,we improves the definition of semantic features which is data flow system call sequence,And gives the corresponding data flow system call sequence alignment algorithm.Using this feature definition and corresponding algorithm,the largest common subsequence of the homologous malware collection is obtained as its malicious semantic feature,and finally this feature is used to classify the sample to be detected.The scenario assumed in our work is that there is a malware collection of known homologous malware families.The task is to determine whether the unknown software sample is homologous to any homologous malware family in the collection or not.The scope of known code obfuscation techniques used by malware and software samples to be detected is the basic block obfuscation technique,control flow obfuscation technique,useless system call insertion,and system call dependency confusion.Our solution extracts the semantic features of the unknown software sample by dynamically executing the system call sequence and related execution instructions,and compares it with the malicious semantic features of the previously obtained from homologous malware,thus The judgment result of classification is obtained,so that we makes up for the shortcomings of the previous related work that can not be classified and recognized when facing some non-homologous malware.3 Based on Intel instrumentation tool Pin,we implements the dynamic execution environment,which can obtain the system call sequence and execution instruction sequence of the detection software sample;and also implements the prototype system,and run experiment on the actual malware sample collection.The experimental results show that data flow system call sequence and data flow system call sequence alignment algorithm can extract the semantic features of the homologous malicious binary software with the interfere of code obfuscation technology,and use this feature to complete the classification of the unknown sample,and solve part of the difficulty encountered by previous related work in the classification of homologous binary malware.
Keywords/Search Tags:malware classification, code obfuscation, dynamic execution
PDF Full Text Request
Related items