Font Size: a A A

Binary Code Similarity Analysis Techniques For Vulnerability Detection

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:L R ChengFull Text:PDF
GTID:2518306572451084Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
At present,the number of open source software is increasing.Some open source software packages will be referenced by developers to improve the development efficiency,and the vulnerabilities that may be included in open source software will be repeatedly referenced.Some serious vulnerabilities will cause huge losses,especially after the code is reused many times,the scope of vulnerability will be exponentially expanded,which makes it urgent to use some technical means to detect whether the software contains known vulnerabilities,and minimize the impact of potential holes on the program operation.Vulnerability detection is essentially a similar matching task of code.Most of the existing vulnerability detection methods based on traditional methods rely on manual extraction of vulnerability characteristics to detect vulnerabilities,which has low scalability and high time complexity.Deep learning technology has been widely used in many fields because of its strong learning ability and representation ability,such as natural language processing.Assembly code has many characteristics similar to natural language.It is a way of thinking to solve binary code problem by using the model of natural language processing field.In this paper,we design a vulnerability detection engine based on binary code similarity analysis technology,which adopts the double-layer detection mode of basic block and function,introduces the pre training model in the field of natural language processing to extract the semantic features of basic blocks,and designs two kinds of function semantic and structural feature extraction models for different types of functions,Due to the high time complexity of the function feature extraction model based on graph neural network,this paper finds that only using multi-layer perceptron model to aggregate function information can achieve high accuracy for functions with few basic blocks and simple structure,Therefore,this paper uses simple multi-layer perceptron model and graph neural network-based function semantic and structure feature extraction model to extract function feature vectors for small functions with simple structure and large functions with complex structure,and finally applies similarity measurement methods(such as Euclidean distance,cosine similarity,etc.)to measure the similarity between functions,After setting the threshold,the suspicious vulnerabilities contained in the vulnerability library are identified.In the end,this paper conducts binary code similarity experiments on seven open source software across compiler types and optimization levels,and the results show that the accuracy is improved by about 10% compared with the existing model(asm2vec).Finally,this paper applies the vulnerability retrieval engine to the public vulnerability Library(ESH)to detect the open vulnerabilities of two CVEs,The top10 matching results accurately contain the corresponding vulnerabilities in the vulnerability library.
Keywords/Search Tags:Binary Code, Vulnerability Detection, Attention mechanism, Graph neural network
PDF Full Text Request
Related items