Font Size: a A A

Research On Intelligent Vulnerability Detection Methods Based On Scalable Code Metrics

Posted on:2022-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2518306776494374Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the increasing scale of software,software architecture is also becoming more and more complex,leading to the prevalence of security vulnerabilities in software,which is vulnerable to malicious attacks.Vulnerability detection technology has become a research hotspot in the field of software security.Most traditional vulnerability detection methods rely on security flaw rules given by experts and manually defined vulnerability characteristics.However,with the increase in the scale of software systems and the emergence of new and varied vulnerabilities,not only is the cost of labor too high,the subjectivity of experts in defining rules and features can also affect the rate of false positives and missed positives.With the rise of artificial intelligence technology,a large number of artificial intelligence-based vulnerability detection studies have emerged,given that machine learning and deep learning have strong big data characterization capabilities.However,most of the existing research methods do not consider a multidimensional representation approach,leading to problems such as missing information in the source-code oriented detection process.In response to the above problems,this paper proposes an intelligent vulnerability detection method(Scalable Vulnerability Detection Method,SVDM)based on scalable code metrics,the details of the study are as follows.(1)To address the problem of the lack of semantic information in the current code representations,this paper introduces function granularity,line granularity code metrics and semantic metrics,and constructs a multi-scale code metric to characterize the source code.Firstly,the source code is processed into code slices by preprocessing,and the code slices are converted into one-dimensional feature sequences by continuous bag-of-words(CBOW)technique and used as semantic metrics;secondly,the function-grained code metrics and line-grained code metrics are constructed by computing and counting the text metrics and complexity metrics in the code slices,respectively.This multi-scale code characterization approach takes into account the complexity of text,vocabulary and other metrics while extracting semantic information of source code context and provides a reliable code characterization for deep learning-based vulnerability detection methods.(2)Inspired by the Feature Pyramid Network(FPN)in the field of image processing,this paper designs a multi-scale feature network(SFN)for vulnerability feature extraction,which constructs a three-layer longitudinal feature extraction network by using a Bi-LSTM and a two-layer Convolutional Neural Networks(CNN).The Bi-LSTM feature extraction layer takes the semantic metric of the multi-scale code metric as input,and the two CNN feature extraction layers take the function granularity metric and the row granularity metric as input,respectively.This one-to-one correspondence network model ensures that the feature information of the source code can be completely transformed into feature vectors,in addition to combining feature vectors of different scales through feature fusion techniques,which is used to improve the detection model's ability to perceive vulnerability features and improve the effectiveness of the detection method.(3)A Bi-LSTM-based vulnerability characterization learning network is designed and implemented.The network model includes an input layer,a Bi-LSTM layer,a dense layer,and an output layer,in which the input layer takes the feature vector output from the multi-scale feature network as input and relies on the Bi-LSTM time series processing capability to perform characterization learning on the feature vector to obtain the high-level semantic features of the source code,and also alleviates the problems of gradient disappearance and poor long-term information-dependent capability.The dense layer reduces the dimensionality of the output of the Bi-LSTM layer and normalizes the detection results by a Softmax logistic regression model,and classifies the learned features to achieve the purpose of vulnerability detection.Finally,the SVDM model proposed in this paper is experimented against a dataset(Multi-SET)containing two vulnerability types,CWE-119 and CWE-399.The experimental results show that the overall accuracy and recall rates of SVDM on the Multi-SET dataset reach 84.3% and 83.4%,respectively,and the false-negative and false-positive rates are reduced to less than 17%.Compared with other existing databases and vulnerability detection methods,the multi-scale code metric constructed in this paper has more comprehensive characterization capability,and the SVDM has a higher precision and a lower false positive rate.
Keywords/Search Tags:Vulnerability Detection, Multi-scale, Code Metrics, Deep Learning
PDF Full Text Request
Related items