Font Size: a A A

Semantic Learning Based Binary Vulnerability Code Clone Detection

Posted on:2020-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2428330626964652Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the widespread code reuse and the use of third-party open source libraries,there are a large number of similar code segments in the software.Once a vulnerability exists in these code snippets,programs that share similar code segments face a high security risk.At present,the research work on vulnerability code clone detection is generally at the source level,and there are few studies on binary vulnerability code clone detection.However,due to commercial copyright protection and other reasons,in many cases we are unable to obtain the source code.Therefore,the problem of binary vulnerability code clone detection has increasingly become the focus of research and practice in the security field,and has attracted more and more researchers' attention.Compared to source vulnerability code clone detection,binary vulnerability code clone detection faces more challenges.On the one hand,binary programs are not easy to understand,lacking semantic information such as functions,variables,variable types,etc.,resulting in source-based technology can not be used for binary vulnerability code clone detection.On the other hand,the detection method based on text and parsing is less accurate,and the detection method based on semantic analysis can improve the detection accuracy to some extent,but the method is based on the matching of graph and tree.It is very slow and cannot be used in practical applications.Considering the importance and challenge of binary vulnerability code clone detection and the excellent learning ability of deep learning model,this paper designs and implements a binary vulnerability code clone detection method based on semantic learning based on deep learning.The main contributions of this paper are as follows:1)Propose the training mode of “pre-training + fine-tuning”,and design and implement a method of constructing large-scale training sample pairs,which solves the problem that the model has low precision due to insufficient training samples.2)Improve the existing binary code semantic representation method,propose the feature representation method of basic block feature and structured features based on semantic flow graph,and improve the detection accuracy by using more abundant semantic information.3)Propose a binary vulnerability code clone detection model based on semantic learning,and integrate function semantic information into the network model;thus improve the accuracy of binary vulnerability code clone detection.In order to evaluate the effectiveness of the method,we have done a lot of comparative experiments.They prove that the method proposed in this paper has better detection effect,and the detection result is about 10% higher than the existing method,eg Gemini.In addition,while ensuring the accuracy of the detection,the detection speed of the method is still very fast,and the similarity detection of a pair of input samples can be completed on average about 0.19 s.
Keywords/Search Tags:Binary Code, Vulnerability Mining, Clone Detection, Deep Learning, Semantic Learning
PDF Full Text Request
Related items