Font Size: a A A

Research On Cross-Architecture Binary Vulnerability Detection Based On Code Similarity Comparison

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:B ChenFull Text:PDF
GTID:2518306521957949Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Code reuse can improve the efficiency of software development greatly and may also bring some security risks.If the reused code snippet contains vulnerabilities,all software systems built on this basis will be affected,so vulnerability detection has always been software Important issues for research in the security field.However,the source code is not available for most commercial software and device firmware images.In addition,with the popularity of Io T devices,more and more programs are ported to run on platforms with different architectures.Therefore,the detection of cross-platform binary vulnerabilities has increasingly become the focus of research in this field.Binary code similarity detection is used to measure the similarity relationship between two or more binary program components.Using the published samples of known vulnerabilities,through similarity analysis,it can help security analysts to quickly locate the same or similar vulnerabilities in other software.However,the currently proposed cross-platform binary code similarity detection method relies on manually selected basic block statistical characteristics and program internal structural information as the representation of the binary code.On the one hand,the granularity of manually selected statistical features is relatively coarse,and is too dependent on expert knowledge,which is easy to introduce human error.On the other hand,the structural characteristics of the program will change significantly with the instruction set architecture and compilation configuration.Inspired by the analysis of text similarity in natural language processing,this paper uses a neural machine translation model to automatically capture the semantic information of the binary function through unsupervised learning,and generate the embedding vector of the function.On this basis,a cross-platform binary vulnerability detection framework is designed and implemented.The main contributions of this article are as follows:1.A phased data preprocessing process is proposed.In order to solve the problem that the original binary function cannot be directly used as the training input of the neural machine translation model,this paper designs a binary function multi-sequence generation algorithm,which converts the CFG of the function into a linear sequence of multiple assembly instructions,where each sequence represents a potential execution path.2.Designed and proposed a "pre-training + fine-tuning" binary function semantic embedding vector generation model.From the function granularity,the neural machine translation model is used to automatically extract the semantic information of the binary function.Compared with the method of manually extracting features,it can contain more original semantic information of assembly instructions and structured information inside the program at the same time.3.Based on binary code similarity comparison,a general framework for cross-platform binary vulnerability detection is designed and implemented.The detection framework can effectively detect real vulnerabilities in real firmware,and has a higher accuracy rate than existing methods,which fully demonstrates that the detection framework proposed in this paper can effectively solve the problem of cross-instruction set architecture binary vulnerability checking.
Keywords/Search Tags:Binary code, Similarity comparison, Cross-Architecture, Bug search, Neural machine translation
PDF Full Text Request
Related items