Font Size: a A A

Research On Static Code Defect Detection Based On Multi-Engine Fusion

Posted on:2022-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2518306575962159Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the current development of the software industry,how to detect defects in source code during the software development lifecycle has become an urgent need for improving software quality,reducing software problems and eliminating attack risks.Although there are many kinds of excellent static code detection engines available,there are still deficiencies or shortcomings such as high false positive rate and low coverage of CWE defects,which bring many problems to users in the actual use cases.Meanwhile,how to choose the best static code detection engine has also turn out to be a severe challenge.A systematic evaluation of the capabilities of four commonly used static code defect detection engines such as,Clang Static Analyzer,Code QL,Cpp Check and Flawfinder,were conducted on the Juliet Test Suite standard defect test set and evaluated by coverage,precision,recall and F1 in the dissertation.Based on the evaluation,the concept of plausible weights was proposed,which is using to aggregate and synthesize the output of the above engines.Finally,the performance differences and improvements between the fused engines and the four individual engines were evaluated comprehensively.Since the effect of fusing-engine outputs using plausible weights includes many false positives,the paper considers using machine learning with additional information to classify true positive and false positive.Firstly,the paper chose KNN as the default classifier.,which can achieve 96.4% F1 value for positive reports.Next,Graph Code BERT,a new programming language-based pre-training model,is used to extract and vectorize the context,function call relationships and data flow between codes,and then some appropriate information is selected from the output of the engines and vectorized together with the output of Graph Code BERT as input to KNN classifier.The results show that the fusion effect using this approach realizes a 24.74% compared to the previous F1 value,meanwhile more than 200,000 false positives are filtered,which experimentally proves to be very effective and shows that the fused results are more usable than the individual detection engine results.Finally,a static code defect detection system using the above fusion engine is designed and implemented,and the experiments prove that the system was accomplished with the designed functions.
Keywords/Search Tags:Static Code Analysis, GraphCodeBERT, Machine Learning, System Design, Integration Optimization
PDF Full Text Request
Related items