With the continuous advancement of social informatization,the dependence of various industries on information technology is increasing,and society as a whole has higher requirements for the stability of software.At the same time,software products are becoming increasingly diverse,and the scenarios for carrying software are becoming increasingly widespread,making the social value carried by software increasingly important,and the importance of software security is also becoming increasingly prominent.In response to the current explosive growth of software vulnerabilities,how to solve the problems of limited detection scope,heavy reliance on expert knowledge,coarse detection granularity,and difficulty in locating vulnerabilities in existing vulnerability detection technologies has become an important challenge in the field of software vulnerabilities.Therefore,this thesis delves into intelligent detection and localization methods for software source code vulnerabilities in response to the aforementioned challenges.This method utilizes deep learning models and program slicing technology to achieve efficient and automated detection and localization of vulnerabilities.The main research work of this thesis is summarized as follows:(1)A mixed language software project leak detection method based on deep learning is proposed to address the issue that existing deep learning based vulnerability detection technologies mainly focus on feature learning for a single programming language and cannot achieve high-precision vulnerability detection for software projects composed of different types of programming languages.First,a named entity recognition model based on bidirectional gate recurrent unit-conditional random filed was constructed,and the named entity recognition of multilingual source code and program slicing reconstruction algorithm were realized.Then,a vulnerability detection model based on bidirectional long short-term memory network was constructed to learn the vulnerability code characteristics of multilingual software projects.The experimental results on the SARD and Cross Vul datasets show that the method has an average recall rate of 94.5% for identifying software vulnerabilities,with an F1 score of 92.5%.Compared to the latest deep learning based vulnerability detection method,the method has an average higher recall rate of 5.7% and an F1 score of 4.6%.(2)Accurately locating the specific location of software vulnerabilities is the foundation for developers to patch vulnerabilities.However,most existing deep learning based vulnerability detection tools have coarse detection granularity and cannot accurately locate the line of vulnerability occurrence.Traditional rule-based vulnerability detection tools heavily rely on expert knowledge and cannot cover all vulnerability features.A fine-grained vulnerability localization method based on LLVM is proposed for this purpose.The core of this method is to perform two different granularity program slicing of the source code based on vulnerability syntax features and intermediate code representation,extract fine-grained code blocks that mark the location of vulnerabilities,and then add a fusion layer on the traditional bidirectional gate recurrent unit model to give higher weights to the relevant slicing of the associated vulnerability generation code lines.Finally,a fine-grained software vulnerability localization model is trained.The experimental results show that on a sub dataset constructed based on SARD and NVD datasets containing four types of vulnerability features,the positioning accuracy of the method reaches 29.4%,and the overall detection performance is also better than other comparative methods.(3)A prototype system for intelligent detection and localization of software source code vulnerabilities has been designed and implemented.The prototype system integrates mixed language software project vulnerability detection methods and fine-grained vulnerability localization methods,including modules such as file management,model training,vulnerability detection and localization,and is implemented through a visualization framework for engineering.The prototype system provides technical support for verifying the above methods and has certain practical value. |