Research On Static Code Defect Detection Based On Multi-Engine Fusion

Posted on:2022-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2518306575962159

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the current development of the software industry,how to detect defects in source code during the software development lifecycle has become an urgent need for improving software quality,reducing software problems and eliminating attack risks.Although there are many kinds of excellent static code detection engines available,there are still deficiencies or shortcomings such as high false positive rate and low coverage of CWE defects,which bring many problems to users in the actual use cases.Meanwhile,how to choose the best static code detection engine has also turn out to be a severe challenge.A systematic evaluation of the capabilities of four commonly used static code defect detection engines such as,Clang Static Analyzer,Code QL,Cpp Check and Flawfinder,were conducted on the Juliet Test Suite standard defect test set and evaluated by coverage,precision,recall and F1 in the dissertation.Based on the evaluation,the concept of plausible weights was proposed,which is using to aggregate and synthesize the output of the above engines.Finally,the performance differences and improvements between the fused engines and the four individual engines were evaluated comprehensively.Since the effect of fusing-engine outputs using plausible weights includes many false positives,the paper considers using machine learning with additional information to classify true positive and false positive.Firstly,the paper chose KNN as the default classifier.,which can achieve 96.4% F1 value for positive reports.Next,Graph Code BERT,a new programming language-based pre-training model,is used to extract and vectorize the context,function call relationships and data flow between codes,and then some appropriate information is selected from the output of the engines and vectorized together with the output of Graph Code BERT as input to KNN classifier.The results show that the fusion effect using this approach realizes a 24.74% compared to the previous F1 value,meanwhile more than 200,000 false positives are filtered,which experimentally proves to be very effective and shows that the fused results are more usable than the individual detection engine results.Finally,a static code defect detection system using the above fusion engine is designed and implemented,and the experiments prove that the system was accomplished with the designed functions.

Keywords/Search Tags:

Static Code Analysis, GraphCodeBERT, Machine Learning, System Design, Integration Optimization

PDF Full Text Request

Related items

1	The Design And Implementation Of C++ Source Code Vulnerability Static Scanning System
2	The Design And Implementation Of Static Code Analysis System Based On Machine Learning For Java
3	Research On Evaluation And Integrated Optimization Of Code Static Analysis Tools
4	Research And Implementation Of Automatic Code Defect Identification Based On Machine Learning
5	Design And Implementation Of Android Static Malware Detection System Based On Machine Learning
6	Research On Android Application Detection Technology Based On Static Code Analysis
7	Research On Machine Learning-Based Software Defect Identification
8	Research And Implementation Of A C Language Source Code Static Detection Tool
9	Design And Implementation Of JAVA Source Code Static Analysis System
10	Based On The Static Analysis Of Code Security Flaw Detection System