Font Size: a A A

Research On Machine Learning Based Software Vulnerability Detection And Optimization Technologies

Posted on:2023-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:T M XiaoFull Text:PDF
GTID:2558307169479254Subject:Engineering
Abstract/Summary:PDF Full Text Request
For large-scale and complex software nowadays,the forms of vulnerability code tend to be more diversified.Traditional vulnerability detection methods cannot meet the requirements of diverse vulnerabilities because of their high degree of human participation and weak ability of unknown vulnerability detection.In order to improve the detection of unknown vulnerabilities,machine learning-based software vulnerability detection methods have received wide attention.Since the data within a single project cannot meet the training requirements of machine learning methods,machine learning training with the help of vulnerability data from other projects is required to achieve the purpose of vulnerability detection,and such vulnerability detection methods that require the participation of vulnerability datasets from multiple projects are classified as crossproject vulnerability detection.According to the different objects to be tested,the crossproject vulnerability detection can be divided into two scenarios: detecting the vulnerability of a single project with the help of vulnerability data from multiple projects and detecting the vulnerability in the mixed data of multiple projects.In this paper,the code characterization and sample imbalance problems are investigated for the problems in the above two scenarios,respectively.The specific contributions are as follows:(1)In the scenario of detecting the vulnerability of a single item with the help of vulnerability data of multiple items,a vulnerability detection method based on code property graph(VDCPG)is proposed to address the problem that the false alarm rate and leakage rate of existing methods need to be improved.This method extracts the abstract syntax tree sequence and the control flow graph sequence from the code property graph of the function as the representation method of the function.The representation method can reduce the loss of information in the code representation.At the same time,the method selects Bi-GRU to build feature extraction model.It can improve the feature extraction ability of vulnerability code.Experimental results show that,compared with the method represented by abstract syntax tree,this method can improve the accuracy and recall by 35% and 22%.It can improve the vulnerability detection effect in the current scenario and effectively reduce the false positive rate and false negative rate.(2)In detecting vulnerability in multiple-item mixed data,the analysis of multipleitem mixed data revealed that there were large differences in coding styles between samples and imbalance between positive and negative samples in the dataset.In order to solve these problems,this paper optimizes VDCPG by eliminating the coding style differences,optimizing the characterization method,and reducing the influence of sample imbalance.Based on VDCPG,this paper proposes a vulnerability detection architecture for multiple items of mixed data,which can select different vulnerability detection models according to the actual detection requirements and enhance its vulnerability detection capability and robustness.Experiments show that the optimized VDCPG method improves the recall rate by 23% and 50% on Devign and Reveal datasets,respectively,compared with the original VDCPG method.
Keywords/Search Tags:machine learning, vulnerability detection, code property graph, code representation, Cross Projects
PDF Full Text Request
Related items