Font Size: a A A

An Automatic Vulnerability Data Collection And Processing System For Open-Source Software

Posted on:2022-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:S M LiFull Text:PDF
GTID:2518306572497164Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Vulnerabilities are the root cause of various network space security events.The open-source trend of software has become the mainstream,and security defects rapidly spread with the use and iteration of open source software.Source code vulnerability detection technology has become a research hotspot in order to find and fix vulnerabilities as soon as possible.Deep learning-based vulnerability detection technology can learn vulnerability characterization independently to generate detection models,reduce manual participation,and improve the speed and ability of vulnerability detection.However,there is a lack of large-scale,ground truth and effective vulnerability datasets in the field of software security.Artificially constructed vulnerability datasets have simple sample types and single features,which make it difficult to support the research of real software vulnerability detection.The data of real vulnerabilities are scattered over hundreds of resources and websites,with different forms and different levels of quality,which brings great challenges to vulnerability data collection.To solve the above problems,an automatic vulnerability data collection and update system for open-source software is proposed,and each module is implemented.The system collects data from the National Vulnerability Database of the United States,parses and extracts vulnerability information,puts forward an automatic collection model based on multi-source patch,summarizes rules of patch publishing website,and filters commits to collect patches.In the data processing section,multisource patches are analyzed for content and redundancy removal,and patch files are processed consistently.Aiming at the low quality of vulnerability data,a mechanism to judge the validity of patches based on multiple types of information is presented.Further,it combines open-source software source code to build large-scale fine-grained vulnerability sample library.The system can automatically collect NVD vulnerability information and monitor the release of new vulnerabilities,dynamically expand datasets and patch libraries,and vulnerability sample libraries.At present,effective and fine-grained vulnerability datasets have been obtained,covering 4643 CVE vulnerabilities,13420 file-level vulnerability samples,and 20824 function-level vulnerability samples,with a patch error rate of 7.5%.Compared with the existing research work,the data source of this dataset is broader,the vulnerability information is richer,and the data quality is higher.Moreover,the in-depth learning vulnerability detection experiment proves that the model trained by this dataset can improve the accuracy of real software vulnerability detection by 38.7% compared with the model trained by manual dataset.
Keywords/Search Tags:Data Collection, Vulnerability Database, Vulnerability Detection, Vulnerability Patch
PDF Full Text Request
Related items