Font Size: a A A

Research On Open Source Repository And Graph Neural Network For Vulnerability Detection

Posted on:2022-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:H T WangFull Text:PDF
GTID:2518306527455244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Source code vulnerability detection is the critical technology to protect software systems from attacking by vulnerabilities.Deep learning for vulnerability detection is currently an effective research approach.However,most existing works regard programs as sequential sequences or untyped code property graphs during model training.A large number of vulnerabilities cannot be detected since the structural information of the code is ignored.In order to deal with the above problems,this thesis exploits a novel vulnerability detection model framework,FUNDED,which uses graph neural networks(GNNs)to build vulnerability patterns based on graph relationships to capture the relationship of program control,data,calls,and dependencies flow.FUNDED can learn and extract the graphical representation of the source code compared with previous approaches,where each statement is connected with other statements through different edge relationships.With the help of capturing various program syntax,semantics,and data flow relationships,FUNDED exploits a high accuracy of vulnerability detection.Furthermore,to improve the quality of the training data set and expand the generalization ability of the data,we combine ensemble learning and statistical evaluation to automatically collect high-quality training samples from open source projects.This method provides many vulnerable code samples in real life to supplement limited training samples in the standard vulnerability database,which could provide enough training samples to build an effective bug detection model based on deep learning.The main contributions of this article are as follows:(1)First of all,insufficient training code samples and uneven quality for vulnerability detection models are addressed.We use ensemble learning based on a variety of traditional machine learning classifiers and propose a soft voting mechanism based on "expert ensemble." The prediction scores of multiple classifiers could help the collection of training data.In the meantime,we use nonconformist classification,which filters vulnerability samples with insufficient vulnerability training data quality.Through the above solutions,we have constructed a high-quality vulnerability data set.(2)Secondly,to solve the problem of poor detection performance of model vulnerabilities due to the inferior program representation methods and traditional neural networks cannot capture deep-level semantic information and structural information in the source code.This thesis expands the grammatical relationship of the abstract syntax tree with the help of program data flow and control flow,a code attribute graph with eight different relationships is proposed;and a gated graph neural network is used to learn the structural information of the code attribute graph to construct a more effective vulnerability pattern to detect vulnerabilities.(3)Finally,this thesis combines the above technologies to implement a fully automatic vulnerability collection and detection system,and conducts a data set quality evaluation in C,Java,Php,and Swift languages,and compares five latest data collection models.Furthermore,six vulnerability detection systems were compared with us.Experiments show that FUNDED is significantly better than other methods in all experiments.
Keywords/Search Tags:Open Source Repository, Software Vulnerability, Code Vulnerability Detection, Deep Learning, Deep Graph Neural Networks
PDF Full Text Request
Related items