Font Size: a A A

Research On Vulnerability Detection Based On BiLSTM Model

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:K L GongFull Text:PDF
GTID:2518306479454014Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the deepening application of computer technology,the number and demand of software continue to increase,and the difficulty of development is constantly escalating.Code reuse and complexity make it unavoidable to introduce a large number of vulnerabilities in software.These vulnerabilities hidden in massive code are hard to find.Once exploited,they will cause irreparable economic losses.In order to discover software vulnerabilities in time,this paper proposes a vulnerability detection method based on the BiLSTM model.Firstly,this method extracts the method body from the source code to form a method set,and then constructs an abstract syntax tree for each method in the method set.Secondly,it uses the abstract syntax tree to extract the statements in the method to form a statement set.After replacing the customized variable name,method name and string with some uniform identifiers,a separate node number will be assigned to each statement to form a node set.Thirdly,data flow and control flow analysis methods are used to extract data dependency and control dependency between nodes.Then,the node set extracted from the method body is combined with the above two dependencies to form a feature representation corresponding to the method further processed into a feature matrix by using one-hot encoding.Finally,each matrix is labeled with a vulnerability tag to generate training samples,and a neural network is used to train the corresponding vulnerability classification model.To learn the context information of the sequence better,the BiLSTM network is selected.And the Attention layer is added for further improvement of the model performance.In order to verify the effectiveness of the proposed method,two experiments are set up in this paper.Experiment 1 was performed on a buffer error type dataset.The accuracy and recall reached95.31% and 93.52% respectively,which are better than the existing machine learning-based detection methods: Vul Dee Pecker and AE-KNN.It proved that this method can more accurately detect security vulnerabilities in code.Experiment 2 was performed on three types of data sets:resource management errors,input validation,and numerical errors.The results showed some differences among these data sets,but the accuracy and recall both exceeded 80%,which confirmed the method proposed in this paper can be applied to different types of vulnerabilities.Finally,we analyzed the results and explained the reasons for the differences.
Keywords/Search Tags:Vulnerability detection, Feature representation, BiLSTM, Attention, Classification model
PDF Full Text Request
Related items