Font Size: a A A

Research On Software Vulnerability Detection Method Based On Deep Learning

Posted on:2023-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:M QinFull Text:PDF
GTID:2558306905986939Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous rise of the Internet,software plays a vital role in many aspects,and the scale of software is also expanding.It is inevitable that the software will be brought about by the negligence of relevant practitioners in the process of software design and development.The security issues in the software,the vulnerabilities in the software are increasing day by day.As the first level of screening before software deployment and use,discovering and fixing vulnerabilities as early as possible from the software source code can greatly reduce the loss and impact of software vulnerabilities.Traditional vulnerability detection methods rely on the prior knowledge of security experts.And experience accumulation,lack of generalization ability.For this reason,this thesis proposes a set of software vulnerability detection methods based on deep learning to solve the above-mentioned problems.This thesis proposes a set of end-to-end software vulnerability detection schemes,using deep learning models to determine whether there are vulnerabilities based on the features in the program source code.In the training phase of the model,the syntax features of the program source code in the training data set are extracted through the abstract syntax tree,and the syntax feature token set is obtained.Then the semantic information of the program source code is extracted from the program dependency graph.According to the algorithm designed in this thesis,a set of corresponding complete slices and local slices are generated.Two different types of slices contain different degrees of semantic information.Secondly,in order to reduce the noise interference in the program code slices,this thesis designs a slice data cleaning algorithm to delete comments that are irrelevant to the cause of the vulnerability,optimize the complex variable names and function names in the source code,and then generate them through the word embedding model The vector representation of slices is used as input for subsequent deep learning models.This thesis designs and implements a dual-slice fusion network model based on Fully Silces and Core Slices.The model is composed of two sub-networks,which will learn the features in the Fully Silce and the Core Slice respectively,and then classify whether there are loopholes after the features are fused.In the detection stage of the model,the same slice generation and data processing process can be used to obtain vector representations of Fully Silces and Core Slices,and a well-trained detector can be used to detect whether the slice code contains loopholes.This article sets up three sets of comparative experiments on the SARD and NVD data sets.By comparing with different vulnerability detection methods,comparing with different code intermediate representations,comparing with different deep learning models,verifying from different perspectives the pros and cons of the method proposed in this article.Experiments show that the method proposed in this thesis has an accuracy rate of 96.6% on the test data,and the false alarm rate and the false negative rate are 2.2% and 9.7% respectively.The method proposed in this article can effectively solve the traditional vulnerability detection method’s dependence on security experts,and has a high generalization detection ability.
Keywords/Search Tags:Vulnerability detection, Deep learning, Fully Silce, Core Slice, Feature fusion
PDF Full Text Request
Related items