Font Size: a A A

Research On Binary Software Vulnerability Detecting Technology Based On Similarity Matching

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z S XieFull Text:PDF
GTID:2518306308469234Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Software has played an increasingly important role in social economy and daily life,however,its security situation is not optimistic.Therefore,software vulnerability detection technology has become one of the hot topics in current security research.Binary code analysis is a commonly used method for binary software vulnerability detection.Currently,static analysis methods at code level have been widely used in vulnerability detection,which is performed by constructing abstract syntax trees of the code,or by editing the code execution path distance to judge code similarity.Besides,binary vulnerability search based on machine learning has also made great progress.By extracting typical features of binary code,constructing control flow graphs,and establishing machine learning models,code similarity comparison is performed to determine whether binary software exists known vulnerabilities.However,these methods are computationally expensive and not accurate enough as they fail to make full use of the features of each level in the binary code and the semantic information in the instructions.This thesis proposes FIT,a novel approach that combines deep learning and graph matching methods to detect software vulnerabilities.Specifically,for each binary function,this approach first maps each instruction to a fixed-length real-valued vector by a word embedding model,then constructs a function pre-screening model based on Long Short Term Memory(LSTM)network and deep neural network.The suspicious binary functions are then compared with vulnerable functions based on the enhanced graph match method.This approach not only makes full use of the features of each level of the binary code but also automatically learns the semantic information of the instruction sequences,which improves the accuracy and performance of binary software vulnerability detection.Experiments performed in the common software OpenSSL,CoreUtils,and other real-world software datasets show that FIT outperforms the state-of-the-art learning-based binary vulnerability detection methods and hybrid methods(Gemini,CVSSA and discovRE)for both learning model and graph matching.Specifically,for the accuracy of the learning model,the AUC of FIT is 96.1%,which is improved compared to Gemini(88.9%),CVSSA(78.7%),and discovRE(65.9%).For the accuracy of graph matching,FIT can find the most correct matches than any other approaches.For real-world binary software,FIT can successfully detect known vulnerabilities.In addition,the evaluation of the word embedding model shows that the instruction embeddings obtained by the word embedding model can effectively capture the semantic information of the instructions,and thus improve the vulnerability detection capability of FIT.
Keywords/Search Tags:software security, binary similarity, deep learning, graph matching
PDF Full Text Request
Related items