| The similarity analysis of malware is one of the important and fundamental work in the area for malware protection,detection and research about homologous relationship.The development of malware gradually shows a characteristic that module highly-reuse and variant evolution faster,for which the similarity analysis of malware is just more appropriate for a more detailed offline analysis.This paper proposes a similarity measure method based on malware function-call graph.This method exploits instruction sequences information and structural information of function-call relations.It measures the similar degree of malware by calculating the costing for similar matching and transformation between function-call graph.Finally,it describe the prototype system design and implementation in the paper.After disassembling the malware,We normalize the description and storage the information of instruction mnemonics sequence and function-call graph which extracted from IDA Pro with IDA plugins.We use pairwise sequence alignment algorithm to compare the instruction sequence similarity and calculate the similarity of function-call relations with adjacency matrix.Based on these similarities,we construct complete bipartite graph for the pairwise comparison graphs and calculate the weights of each edge,and then find out the best match and the max weight making use of the Kuhn-Munkres algorithm.With this method we can measure the similarity of malwre by calculating the minimum cost of graph matching.Using for reference of the method of constructing phylogenetic tree in Bioinformatics,we exploit UPGMA method to cluster hierarchically the data which is got from the similarities of malware,and construct aggregation tree to present the similarity relations for malware.The similarity comparison method based on malware function-call graph which is a structural representation known to be less susceptible to code-level obfuscations takes full advantage of the inherent correlation between the functions of malware underlying structural features and instruction information,and improve the accuracy of comparisons.It can give more in-depth characterization about functions and function-call relations,and also reflect effectively similarity relations of malware.It contributes to understanding the relationship between the family and the evolution and development of malware. |