Classification Model In Software Engineering Based On Mainstream Static Analysis Reports

Posted on:2021-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:F Zhao

Full Text:PDF

GTID:2428330626461133

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

In the field of software engineering,source code analysis is an essential part of soft-ware before it goes online.Static analysis(SA)tools examine code for flaws without executing the code,and produce warnings("alerts")about possible flaws.Today,main-stream static analysis tools,such as Coverity,Kolowork and CppTest,have performed well in source code analysis.However,the number of non-code errors in the reports gen-erated by these tools is at least 3 times more than the true errors in the code,which makes the code review work become huge and complicated.With the rapid development of arti-ficial intelligence,existing classification models have become increasingly mature.This article tries to apply artificial intelligence to the field of software engineering.Therefore,it is proposed to use artificial intelligence algorithms to predict the results of static anal-ysis tools,which will greatly reduce the difficulty of manual classification by separating true code errors from non-code errors in the report.This article attempts to apply artificial intelligence algorithms to source code anal-ysis.First of all,for different data features in analysis reports generated by static analy-sis tools,we use corresponding feature engineering methods to extract more information.We use natural language processing methods like TF-IDF(Term Frequency-Inverse Docu-ment Frequency)algorithm and LSI(Latent Semantic Index)for natural language features.About ordered factors,we perform digital transformation.For category features,we per-form one-hot encoding and scientifically reduce the dimension to ensure the integrity of the information carried by the data.Next,due to the training data requirements of artifi-cial intelligence algorithms,and considering the huge workload of manual labeling,this paper proposes to use semi-supervised learning algorithms to label training data.This can make sure that the classification accuracy of the final model is as good as possible without losing the validity of the data.Finally,this paper uses LightGBM(Light Gradient Boosting Machine)to get the final classification model.The software project used in this article contains 106372 C++ files,including 65363 code errors.We use Coverity,KlocWork,CppTest and codesonar to analyze the source code and generate reports.Experiments show that the classification model based on weak supervision proposed in this paper has achieved good results on all four report sets,and has realized the application of artificial intelligence in the field of software engineering.

Keywords/Search Tags:

Source code analysis, Static scanning tools, Semi-supervised algorithm, LightGBM

PDF Full Text Request

Related items

1	The Design And Implementation Of C++ Source Code Vulnerability Static Scanning System
2	A Method Based On Flow Analysis In Source Code Testing Tools
3	Research On Semi-supervised Clustering And Classification Algorithm
4	Design And Implementation Of JAVA Source Code Static Analysis System
5	Research And Implementation Of Security Vulnerability Detection In Application System Based On JAVA WEB Static Source Code Analysis
6	Research And Design Of PHP Code Automatic Vulnerable Detected Tools Based On Static Analysis Technology
7	Research And Implementation Of A C Language Source Code Static Detection Tool
8	The Analysis And Research Of Static Testing Technology Based On C/C++ Source Code
9	The Combination Of Static Analysis And Dynamic Monitoring Of Java Source Code Defect Detection Technology
10	Research On Weibo Sentiment Analysis Technology Based On Semi-supervised Learning