Font Size: a A A

Research On Sample Annotation And Structural Characterization Model For Vulnerability Detection

Posted on:2021-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ChenFull Text:PDF
GTID:2518306107960729Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,the frequent occurrence of cyberspace security incidents has caused an inestimable impact on the society.In this context,vulnerability detection for software systems is increasingly important.Deep learning is widely concerned because of its powerful modeling ability and intelligent learning ability.Researchers have applied deep learning technology to perform representational learning of source code to generate vulnerability detection models.However,there is an extreme lack of real software vulnerability data sets for training models in the current vulnerability detection field.Currently,most effective data are generated manually,which is inefficient and costly.In addition,the existing vulnerability detection methods based on deep learning mostly use linear model,rely on the text information of the source code and ignore the grammatical structure information,resulting in the loss of the source code syntax and semantic information,and also miss many vulnerability characteristics.In order to effectively solve the above problems,a heuristic rule-based tagging strategy for real software vulnerability samples called Gen Do HE,and a deep learning vulnerability detection model based on structural characterization called Astor were proposed.Gen Do HE strategy firstly analyzes the source code and vulnerability information of open source software,then generates heuristic rules according to the analysis results,and finally automatically marks out the vulnerability lines of the source code based on the rules,so as to build the real software vulnerability data set.The Astor model firstly extracts fine-grained samples from the source code,then conducts structural representation of the samples based on abstract grammar tree,and finally uses two-way gated loop neural network to learn the representation results,which can accurately learn the syntactic structure and semantic information carried by the source code.In the experimental stage,the effectiveness of Gen Do HE strategy was firstly verified based on the vulnerability files of two open source software,and then the performance of Astor model was analyzed based on various types of data.The experimental results show that:(1)Gen Do HE strategy can effectively solve the shortage of real software data sets in the field of vulnerability detection;(2)Astor is an effective and practical vulnerability detection model,which can meet the needs of the current vulnerability detection field;(3)the Astor model based on structured representation has more efficient detection capability for vulnerability data with large code length,complex type and richer semantic information;(4)compared with the traditional linear representation model,Astor has a better detection effect,with an8.9% decrease in overall false negative rate and an almost 2.0% increase in F1 index.However,due to the high computational complexity of structured representation,the training time of Astor model is relatively long.
Keywords/Search Tags:Heuristic rules, vulnerability detection, structural representation, abstract syntax trees, deep learning
PDF Full Text Request
Related items