Font Size: a A A

Research And Implementation Of Binary Function Similarity Algorithm Based On Graph Neural Network

Posted on:2020-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:P L ZhaoFull Text:PDF
GTID:2428330572496558Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The binary function similarity detection is a mechanism to calculate the similarity degree of two binary functions from different platforms,compilers,optimizers and software versions.It plays a critical role in many fields like cybersecurity and intellectual property,as it enables binary code analysis without having access to the corresponding source code.Many problems like malware analysis,vulnerability detection and copyright infringement can be abstracted and converted to binary function similarity detection problems.Some existing approaches rely on the approximate graph-matching algorithms,which are in-evitably slow and sometimes inaccurate,and hard to adapt to new tasks.Some other approaches are based on Graph Neural Network(GNN).The workflow of these solutions to compare binary func-tions is first transforming binary code into multi-dimensional vector representations(embeddings)and then comparing vectors using simple and efficient numerical operations.However,embed-dings are usually derived from Attributed Control Flow Graph(ACFG),where the attributes of the basic blocks are obtained by manual feature extraction,which fails in considering important function characteristics.Besides,GNN without attention mechanisms cannot learn some important features of the graph.To address these issues,in this work,we propose a novel method to compute the attributes of the basic blocks,which utilizes unsupervised approaches for feature extraction to avoid incurring human bias and provide better portability.Based on structure2vec network,we propose a novel GNN with an attention mechanism to compute function embeddings.This attention mechanism can automatically learn the weights of different nodes in the ACFG.Then the detection of function similarity can be performed efficiently by measuring the distance between the embeddings of the two functions.The main contributions of this thesis are as follows:1.We propose a general binary similarity detection framework:FuncSim,which is highly mod-ular,scalable,and compatible with existing detection methods.Our FuncSim includes three modules:flow graph extraction module,basic block feature learning module,and the graph embedding network.2.We design and implement an unsupervised feature learning method based on instruction nor-malization.We borrow this method from natural language processing,which aids the learning process of instruction feature.We leverage instruction normalization to reduce the specificity of instructions.Compared to the manual feature extraction method,the AUC(Area Under Curve)value of detection performance is improved by about 5%.3.We propose and evaluate a novel GNN with an attention mechanism.Compared to the tra-ditional structure2vec,the network can not only learn the information of the neighborhoods by iteration but also specify different weights to different nodes in a neighborhood,which can improve the accuracy of graph embedding.Compared to structure2vec,the identification performance(AUC value)is improved by about 3%.
Keywords/Search Tags:binary function similarity, unsupervised learning, graph neural network, self-attention
PDF Full Text Request
Related items