Font Size: a A A

Software Vulnerability Detection Method Based On Code Semantic Vector Representation And Deep Learning

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:W G ZhangFull Text:PDF
GTID:2428330611498152Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of the Internet era and the continuous improvement of software technology,software plays an increasingly important role in human life,and the scale of software is growing.At the same time,due to the continuous expansion of software scale and negligence of developers,the quality of software varies,and there are more and more vulnerability in the software,which seriously affects the user's experience and security of the software,and poses a great threat to the safety of human life and property.The earlier the vulnerabilities in the software are detected,the smaller the loss will be,so it is very important for software maintainers to detect the vulnerabilities in software in time.It is very complex and time-consuming to detect vulnerabilities in code manually,so it is necessary to use vulnerability detection software for automatic batch detection.The current code vulnerability detection methods have high underreporting rate and false alarm rate,and the detection granularity is relatively coarse,generally file level vulnerability detection,so the work of this paper mainly consists of two parts: the first part is to extract features from the code,and then to learn the representation of the features into vectors,the second part is to input the feature vector of the code slice in the vulnerability detection model,the probability of whether the output code is vulnerable through the code vulnerability detection model.The key and difficult point is to extract useful features in the code,which can show the characteristics of code vulnerabilities,and then transform the features into vectors that can be input into the code vulnerability detection model.This vectorization method is also very important.We need to learn the semantic information between code tokens according to the context information of code statements.In addition,we need to take different deep learning networks for different vulnerability types to achieve relatively good vulnerability detection results.Summarize the main work of this paper:First of all,different vulnerability data sets are analyzed,and high-quality vulnerability data sets are selected.Then preprocess the vulnerability data set.First,generate the abstract syntax tree and program dependency graph of the source file,and then slice the program according to the node in the abstract syntax tree and the data dependency and control dependency in the program dependency graph,and transform the slice into vector for vulnerability detection.In this paper,four types of vulnerabilities are detected.They are Pointer use related vulnerabilities,array use related vulnerabilities,arithmetic expression use related vulnerabilities,function call related vulnerabilities.Each vulnerability's data set is tested separately.According to the corresponding key points,the corresponding type of program slice can filter out some irrelevant statements,reduce the noise,and make the prediction results more accurate.Secondly,in order to input the extracted features into the vulnerability detection model,we need to transform them into vectors,that is,vectorization.First,word vector is obtained by word2 vec,and then sentence vector is further obtained based on word vector.Finally,the vector representation of these vulnerability codes is used to train deep learning model for vulnerability detection.Then for the extracted features,in order to input into the vulnerability detection model,it needs to be transformed into vector,that is,vectorization.First,word vector is obtained by word 2vec,then sentence vector is further obtained on the basis of word vector,and then machine learning method is used to preliminarily verify the effect.A better vectorization method is selected for subsequent training and testing of vulnerability detection model.In the deep learning based vulnerability detection model,in addition to the commonly used bilstm model,this paper also uses two deep learning models,textcnn and desnsecnn,and tests the detection results of these models.In addition,attention mechanism is added to the model to further improve the accuracy of vulnerability detection.
Keywords/Search Tags:Code vulnerability detection, program slicing, bidirectional Longshort-term memory network, convolutional neural network
PDF Full Text Request
Related items