Research On Open Source Repository And Graph Neural Network For Vulnerability Detection

Posted on:2022-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:H T Wang

Full Text:PDF

GTID:2518306527455244

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Source code vulnerability detection is the critical technology to protect software systems from attacking by vulnerabilities.Deep learning for vulnerability detection is currently an effective research approach.However,most existing works regard programs as sequential sequences or untyped code property graphs during model training.A large number of vulnerabilities cannot be detected since the structural information of the code is ignored.In order to deal with the above problems,this thesis exploits a novel vulnerability detection model framework,FUNDED,which uses graph neural networks(GNNs)to build vulnerability patterns based on graph relationships to capture the relationship of program control,data,calls,and dependencies flow.FUNDED can learn and extract the graphical representation of the source code compared with previous approaches,where each statement is connected with other statements through different edge relationships.With the help of capturing various program syntax,semantics,and data flow relationships,FUNDED exploits a high accuracy of vulnerability detection.Furthermore,to improve the quality of the training data set and expand the generalization ability of the data,we combine ensemble learning and statistical evaluation to automatically collect high-quality training samples from open source projects.This method provides many vulnerable code samples in real life to supplement limited training samples in the standard vulnerability database,which could provide enough training samples to build an effective bug detection model based on deep learning.The main contributions of this article are as follows:(1)First of all,insufficient training code samples and uneven quality for vulnerability detection models are addressed.We use ensemble learning based on a variety of traditional machine learning classifiers and propose a soft voting mechanism based on "expert ensemble." The prediction scores of multiple classifiers could help the collection of training data.In the meantime,we use nonconformist classification,which filters vulnerability samples with insufficient vulnerability training data quality.Through the above solutions,we have constructed a high-quality vulnerability data set.(2)Secondly,to solve the problem of poor detection performance of model vulnerabilities due to the inferior program representation methods and traditional neural networks cannot capture deep-level semantic information and structural information in the source code.This thesis expands the grammatical relationship of the abstract syntax tree with the help of program data flow and control flow,a code attribute graph with eight different relationships is proposed;and a gated graph neural network is used to learn the structural information of the code attribute graph to construct a more effective vulnerability pattern to detect vulnerabilities.(3)Finally,this thesis combines the above technologies to implement a fully automatic vulnerability collection and detection system,and conducts a data set quality evaluation in C,Java,Php,and Swift languages,and compares five latest data collection models.Furthermore,six vulnerability detection systems were compared with us.Experiments show that FUNDED is significantly better than other methods in all experiments.

Keywords/Search Tags:

Open Source Repository, Software Vulnerability, Code Vulnerability Detection, Deep Learning, Deep Graph Neural Networks

PDF Full Text Request

Related items

1	Research On Security Detection Of Open Source Software For Source Code
2	Open Source Software Vulnerability Mining Method Based On Knowledge Graph
3	An Approach For Using Deep Learning To Detect Code Vulnerabilities
4	Research On Source Code Vulnerability Detection Based On Deep Learning
5	Research On Source Code Vulnerability Detection Method Based On Graph Neural Network
6	The Study And Implementation Of Software Vulnerability Detection Based On Large-scale Open Source Repositories
7	Research And Implementation Of C-Language Vulnerability Static Detection Based On Flow-analysis And Graph Neural Networks
8	Research On Software Buffer Overflow Vulnerability Detection Method Based On Deep Learning
9	Software Vulnerability Detection Method Based On Code Semantic Vector Representation And Deep Learning
10	Research On The Source Code Vulnerability Mining Technology Based On Deep Learning