Font Size: a A A

The Study And Implementation Of Software Vulnerability Detection Based On Large-scale Open Source Repositories

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:X KeFull Text:PDF
GTID:2428330611457099Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The increasing scale and increasing complexity of software presents severe challenges to the research of software security vulnerabilities.How to implement automatic vulnerability detection is one of the issues that need to be solved urgently.With the development of deep learning technology,the newly introduced deep learning-based vulnerability detection method can make up for the lack of rules-based methods and traditional machine learning-based methods relying on the manual definition of vulnerability rules and vulnerability features to achieve automatic detection.However,the existing deep learning-based research methods are mainly modeled based on the vulnerability database data set.Although the detection accuracy on the vulnerability database data set is high,when the model is used to detect the code written by developers,the accuracy is greatly reduced.After experimental analysis,the reasons can be summarized as the following aspects,on the one hand,the vulnerability database data set is small in size and the coverage of the vulnerability is low,but the code written by developers in practical applications is more complex and diverse;on the other hand,the existing methods ignore the program control dependency and hierarchical structure information,lost the connection between program semantics and vulnerability features.This article focuses on the improvement of existing work on these two aspects,and proposes a solution based on deep learning technology.The research content of this article can be summarized as follows:(1)Aiming at the problems of small data set size,low coverage and lack of semantic information in source code vulnerability detection,a vulnerability detection model is proposed,which uses Bidirectional Long Short-Term Memory to detect the code.On the one hand,we propose this model based on the code written by developers in open source repositories.On the other hand,in order to better explore the relationship between program semantics and vulnerability features,first,a slicing method that uses data dependency and control dependency is proposed to process source code;then,a distributed representation method of source code based on a collection of paths in its abstract syntax tree and attention mechanism is used to learn the context representation of the source code;finally, the neural network Bi LSTM is used to automatically learn the vulnerability features of the source code.(2)Designed and implemented the prototype system named Vul Finder based on the proposed source code vulnerability detection model.In this article,we will introduce the functions and modules of the system,describe the design process and key algorithms of each component,and show the vulnerability detection process.(3)Designed and implemented a set of comprehensive experiments to verify and evaluate the model and system proposed in this paper from multiple dimensions.The experimental results show that the method proposed in this paper has a higher F1-score than the related work,whether it is on the data set collected in the open source repositories or the data set in the vulnerability database,meanwhile,it has the characteristics of automated learning vulnerability features and fine-grained vulnerability detection.
Keywords/Search Tags:Vulnerability detection, Deep learning, Security, Bidirectional Long Short-Term Memory
PDF Full Text Request
Related items