Font Size: a A A

Research On Malware Dynamic Behaviors Knowledge Graph Embedding

Posted on:2022-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2518306569494844Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet information technology,the convenience of network communication and information exchange,the spread of malware has become faster.The growth rate of malware is accelerating,both in terms of quantity and type.In this rapid growth situation,traditional signature-based malware detection techniques are increasingly difficult to cope with the speed of new malwares.With the rise and development of machine learning technology,malware analysis and detection methods based on machine learning are constantly being proposed.The current research in the field of malware mainly focuses on the feature representation and classification of malware.Generally,API and Opcode are used to represent the behavioral features of malware.Furthermore,a control flow graph based on API and Opcode appeared.However,this lower-level feature is less readable and inconvenient for security analysts to read and understand malware behavior.For this reason,in this paper,the behavior report obtained by the malware running in the security sandbox is used as the data source.The behavior report intuitively reflects the addition,deletion,modification,and insertion of the malware on the operating system,which is more readable.According to the data characteristics of the behavior report,this paper proposes Individual-Head knowledge graph structure centered on individuals.Because different individuals of the same family in the Individual-Head structure have a large number of the same behaviors repeatedly,resulting in the redundancy of the graph structure.In order to reduce this redundancy and obtain a more effective family behavior graph representation,we improved the Individual-Head graph structure and proposed the Family-Head knowledge graph structure.Experiments have proved that Family-Head effectively reduces the behavior redundancy in different individuals from the same family,and its knowledge graph structure more effectively represents family behavior.Although the knowledge graph is more readable,it is a discrete feature,and machine learning methods generally require continuous vector features as input.To this end,based on the general knowledge graph embedding method,this paper proposes a set of embedding vector representation methods for individuals and families based on the characteristics of the Individual-Head and Family-Head knowledge graph structures proposed.The Family-Head knowledge graph structure proposed in this paper and its corresponding individual embedding vector representation are superior to the Individual-Head proposed in this paper in family classification tasks.More experiments have shown that the Family-Head knowledge graph structure and its corresponding embedding vector representation features reach the current research level in family classification tasks.In the experimental analysis of this paper,the experiment visualized the embedding vector features and analyzed the distribution of features in the samples.This paper also constructs a metric based on the distance between nodes in the graph and the connectivity of the graph.We quantitatively analyzes the reasons why the structure of the Family-Head graph and the corresponding embedding vector feature are better.
Keywords/Search Tags:malware, knowledge graph embedding, machine learning, representation learning
PDF Full Text Request
Related items