Font Size: a A A

Research On Software Vulnerability Detection Method Based On Code Feature Learning

Posted on:2023-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:W LinFull Text:PDF
GTID:2568306776475384Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the information society and the frequent attacks on software systems,security is an issue that must be addressed.Within software security,automated detection of software vulnerabilities is an important subject.The rapid development of software technology and the growing diversity of user needs have significantly increased the complexity of software,further improving the probability of the existence of vulnerabilities in the source code.In addition,open-source software has become a general trend,and developers’ trust in open-source code and direct reuse of open-source code provide opportunities for the spread of vulnerabilities,which lays a security risk.Attackers can exploit software vulnerabilities for malicious intrusion or damage,which poses a threat to the reliability,integrity,and access control of the system or data,ultimately incurring incalculable damage.Therefore,the research for the source code vulnerability detection still requires strengthening.The traditional rule-based vulnerability detection methods are highly subjective and targeted and require expert knowledge of the subject.The code similarity-based vulnerability detection methods suffer from a high false negative rate when detecting the vulnerabilities that are not incurred by code cloning.In order to avoid the above problems,this thesis chooses the deep learning-based vulnerability detection method which has better generalization.This study uses the semantic features of the source code and an improved temporal convolutional network(TCN)to implement vulnerability detection,and consequently improve the capability of vulnerability detection.This method does not require the code to be executable.For the executable source code,in order to enrich the features,this study proposes a vulnerability detection method based on the semantic features of source code and Low Level Virtual Machine Intermediate Representation(LLVM IR).Finally,a prototype system for software vulnerability detection based on code feature learning is designed and implemented,which integrates both methods.The major contributions of this thesis are summarized as follows:1.Recurrent neural network variants,which are existing prevalent for the source code vulnerability detection,have a disadvantage in data parallel processing,while the perceptual field of traditional convolutional neural networks is limited by the convolutional kernel size.TCN performed well in the sequential task for the advantages of high parallel,flexible receptive fields and stable gradients.However,TCN cannot simultaneously capture the bidirectional semantics of the source code since it is not a bidirectional network structure.This thesis proposes a vulnerability detection method based on semantic features of source code and improved TCN.The improved TCN in this thesis can effectively reduce the impact of unimportant information in source code on the software vulnerability detection task and capture the contextual semantics in the source code.Compared with the traditional TCN,our model increased the accuracy by 3.75%,2.16% and 2.55% on the BE-ALL dataset,RM-ALL dataset and HY-ALL dataset,respectively.2.C/C++ source files often contain macro/type definitions or references to header files,however,the code slices generated by the slicing approach do not include these contents,which causes some information to be lost.In order to tackle this problem,this thesis proposes a vulnerability detection method based on the semantic features of source code and LLVM IR,which applies to executable C/C++ code.This method uses the semantic features of LLVM IR as assistance to enrich the features that are fed into the neural network.The experimental results demonstrate that the method can further enhance the capability of vulnerability detection compared with the source code semantic feature-based vulnerability detection method.3.Design and implement a prototype system for software vulnerability detection based on code feature learning.The prototype system integrates code feature extraction algorithms,the improved TCN model and other neural network models for comparison.The prototype system is mainly divided into a source code slice generation module,a LLVM IR slice generation module,a vectorization module,a network model implementation module and a data presentation module.The prototype system has certain practical value for code vulnerability detection.
Keywords/Search Tags:Vulnerability Detection, Deep Learning, Temporal Convolutional Network, Deep Residual Shrinkage Network, Intermediate Representation
PDF Full Text Request
Related items