| Malware is one of the main threats to network security,and with the development of the Internet,the amount of malware has increased dramatically,and most new malware is a variant of malware that has already appeared.Malware similarity measurement technology can detect malware variants,help to discover high-value family features,reduce the amount of manual annotation,and has practical application value.Existing malware similarity measurement methods mainly abstract the disassembled assembly code into structured program sequences or structured program graphs,which are easily affected by technologies such as code obfuscation.Aiming at the shortcomings of the existing methods,this paper proposes two methods to solve the problems existing in the existing malware similarity measurement methods,and designs and implements a Windows malware similarity analysis system to provide malware detection and similarity analysis services.The main content of this article is as follows:(1)We address the problem that existing malware similarity measures are susceptible to obfuscation techniques and lack the ability to characterize complex relationships between malware.This paper proposes a malware similarity metric method RG-MHPE(API Relation Graph enhanced Multiple Heterogeneous ProxEmbed)based on multiple heterogeneous graphs.This method first uses the dynamic and static features of malware to construct multiple heterogeneous graphs,and then proposes an enhanced proximity embedding method based on relational paths to solve the problem that proximity embedding cannot be applied to the similarity measurement of multiple heterogeneous graphs.In addition,this paper extracts knowledge from the API documentation on the MSDN website,builds an API relationship diagram,and learns the similarity relationship between Windows APIs,which effectively slows down the aging speed of the similarity measurement model.Finally,through comparative experiments,it is verified that the proposed method RGMHPE has the best performance in terms of similarity measurement performance and model anti-aging ability.(2)Aiming at the problems that the traditional malware similarity measurement method using function call graph is often based on graph matching,which is inefficient and has poor generality,this paper proposes a malware similarity measurement method FunctionSim based on function call graph similarity learning.The method extracts the function call graph of malware,analyzes the characteristics of the function call graph,and improves the existing graph similarity measurement method based on graph neural network to achieve good malware similarity measurement effect.(3)We design and implement a Windows malware similarity analysis system,which takes the similarity measurement method proposed in this paper as the core and can provide malware detection and similarity analysis services.This paper introduces the requirements analysis,outline design and detailed design in detail,and fully verifies the function of the system through black box testing. |