| As the pace of scientific,technological,and socio-economic development quickens,computer software has permeated diverse facets of societal production and life,with both the quantity and variety of software escalating.Concurrently,software vulnerabilities have been significantly increasing,and the wide application of open-source software has amplified the potential impact of these vulnerabilities,rendering it a critical concern in the domain of computer security.In light of this scenario,vulnerability detection has emerged as a vital technology for safeguarding software systems against security attacks.The timely detection of vulnerabilities in source code is crucial for ensuring the security of software systems.Conventional rule-based vulnerability detection techniques identify vulnerabilities by aligning them with expert-defined vulnerability patterns in the target source code.However,vulnerabilities in contemporary software often exhibit complex patterns,and the predefined rules of traditional rule-based vulnerability detection techniques tend to be overly simplistic and inflexible,leading to lower precision and higher false positives.Learning-based vulnerability detection techniques have demonstrated improved outcomes compared to traditional methods.Nonetheless,the majority of these techniques merely ascertain whether the code contains vulnerabilities at a higher granularity(such as functions,files,builds,etc.),and offer limited,finer-grained explanations of the detection results.Consequently,developers are still required to sift through copious amounts of code to locate the vulnerable statements that need rectification.In this thesis,a novel deep learning-based code embedding technique for the detection of software vulnerabilities in C/C++ programs is presented as a node classification task at the statement level.In particular,we employ a static slicing method predicated on program dependency graph reachability to extract program slices from the source code using vulnerability-sensitive code elements as slicing criteria,thereby diminishing the source code statements unrelated to vulnerabilities in the samples.By utilizing the enhanced code attribute graph as the source code representation method,and its slice subgraph as the detection granularity of the model,vulnerability features are better captured.Subsequently,the ensemble graph neural network model GGNN and the Transformer model based on the self-attention mechanism are utilized to capture the local structure and global contextual information of the source code,thereby substantially augmenting the model’s feature extraction capability.Ultimately,we model the slice-level vulnerability detection as a whole-graph classification task,and the statement-level vulnerability detection task as a node classification task.A joint loss function is employed to model the two tasks as a multi-task learning model,which not only significantly enhances the precision of vulnerability detection,but also provides finer-grained explanations to assist developers in swiftly identifying the vulnerable statements that require rectification.This method has been evaluated on synthetic and real-world datasets and compared with relevant research.For the vulnerability detection task on the synthetic dataset SARD,the accuracy reached 94.8%,and the F1 score improved by 1.2%-44.5%compared to other methods.In the vulnerability detection task on the real dataset Big-Vul,the F1 score improved by 5.7%-26.6%compared to other methods.For the fine-grained vulnerability line localization task,our method surpassed the explanation-based GNNExplainer method in all metrics. |