Font Size: a A A

Research On Binary Code Recurring Vulnerability Detection Method

Posted on:2023-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:M J YangFull Text:PDF
GTID:2568306617952799Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of electronic information technology,the number and types of software are increasing,and the number of vulnerabilities is also increasing significantly.At the same time,the wide application of open source code in the development process has expanded the scope of the vulnerability,seriously affecting the security of data and information in various fields.Therefore,the research on vulnerability detection technology has also received more and more attention.Since most of the current software exists in the form of binary files,and many software cannot obtain source code based on the protection of software property rights,the vulnerability detection oriented to binary code has more practical application significance.But compared with the source code,the binary code lacks the properties and structure information of the upper layer of the program,so it will be more difficult to analyze and extract features.Existing vulnerability detection methods based on binary code similarity detection have two problems.On the one hand,they are sensitive to software and compiler version changes,which result in inaccurate vulnerability matching and missed detection.On the other hand,vulnerability is falsely reported because of the small difference between the vulnerable and patched functions.This thesis proposes a novel recurring vulnerability detection methodδ-Match.This method constructs a co-occurrence matrix from the bi-gram of the function instruction,uses the Siamese-CNN model for feature extraction to achieve function matching,and finally uses the difference between the co-occurrence matrixes of vulnerable function and patched function to extract regional features strongly related to vulnerability for vulnerability/patch identification.The method combines binary analysis with deep learning methods,so that the key semantic features of the extracted functions can be accurately matched to reduce the false negative rate of vulnerability detection.At the same time,the method utilizes the difference of vulnerabilities and patches to process the objective function to reduce the false positive rate of vulnerability detection.In this thesis,the δ-Match method has been implemented and evaluated from two aspects,function matching and vulnerability detection.In terms of function matching tasks,this thesis firstly performs cross-software version function matching on 6 versions of Coreutils.The results show that even when the version span is large,recall@l can still reach 97.5%,which is 49.3%and 24.1%higher than Bindiff and α-diff,respectively.Secondly,this thesis performs cross-compiler version function matching on the tcpdump compiled by four gcc versions.The results show that when the version span is large,recall@1 is 12.3%higher than Bindiff.In terms of vulnerability detection tasks,we used the collected 390 vulnerability functions to perform cross-version vulnerability function detection on 334 versions of 6 projects and compared with the two methods.The results show that the method in this thesis can accurately detect the vast majority of vulnerabilities,and the F1-Score value can reach 86.8%,and can effectively reduce the false negative rate and false positive rate of vulnerability detection.The thesis also tested the vulnerabilities of the third-party components that depend on the software,and the results showed that fl-score of vulnerability detection could reach 93.8%,which effectively proved the applicability of the proposed method in different scenarios.In addition,in order to solve the limitation of the lack of public datasets in this field,this thesis constructs a large-scale binary function feature database and vulnerability information database to facilitate future research on binary analysis and vulnerability detection.
Keywords/Search Tags:vulnerability detection, binary code analysis, Siamese-CNN, bi-gram
PDF Full Text Request
Related items