Font Size: a A A

Design And Implementation Of Abstract Syntax Tree Based Code Defect Detection

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q H ZhangFull Text:PDF
GTID:2518306308970229Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,computers and computer software have become essential and commonly used tools for human beings.Computer software is like anything tangible,that is,quality affects its functionality.In order to increase the stability of the software,many companies use security compliance,code review,unit testing and other methods to find bugs in the code.These are all software defect detection.However,software defect detection is very costly for the company.Therefore,an automatic method is needed to find the codes that are most likely to contain defects,so that these codes can be checked first,so that software developers and testers can locate software defects in a timely manner,and minimize local software defects to the global software quality.The adverse effect of improving the efficiency of software defect detection is software defect prediction.In research and practical applications,software defect prediction has become an indispensable link in software defect detection.At present,the research on software defect prediction can be divided into traditional software defect prediction for the source code itself and various factors in the software development process,which can be used to distinguish the metric of the code defect.And from the perspective of automatically characterizing code syntax and semantic information,for deep learning defect prediction of special structures containing code information(abstract syntax trees,control flow graphs,etc.),using deep learning networks to automatically extract features and train classifiers to distinguish them.This paper proposes software defect prediction technology based on abstract syntax tree,which applies semi-supervised learning technology to build software defect prediction model.This technique first uses an unsupervised method for model pre-training,and uses the GPT model and software code feature words and word vectors extracted from a large number of codes as training sets to train the feature word representations.Then use a supervised method to modify the structure of the pre-trained model,train the software defect prediction model through the output model's prediction labels of software defects,and compare them with the input source file defect labels,and finally obtain the software defect prediction model.The main innovations of this technology are:(1)Using semi-supervised learning to obtain the syntax and semantic information of software code and apply it to software defect prediction,effectively solving the traditional software defect prediction method's unsatisfactory and unpredictable prediction results due to lack of syntax and semantic features Distinguish the code completely.(2)The use of semi-supervised learning compared to supervised learning solves the problem of lack of large number of labeled samples for model training in the field of software defect prediction,and the problem that the data cannot be fully fitted due to large differences in sample balance,which makes the model evaluation indicators unsatisfactory.Compared with unsupervised learning,it solves the problem of wasteful use of sample data in software defect prediction model training,and the word embedding vector of feature words does not change with the training of the defect prediction model after unsupervised learning.The problem that the meaning of the feature word is not enough to meet the defect prediction is solved.(3)In this paper,the structure of the GPT model is adjusted to better perform software defect prediction feature word pre-training and software defect prediction model training.This article designed a series of experiments to compare the technology proposed in this paper with the method of defect prediction using DBN and traditional classification models,and the defect prediction method(ISDA)based on optimized subclass discriminant analysis.Defect prediction task in the project,the method proposed in this paper is 5.1%and 8.3%higher than the average F1 of DBN and ISDA,respectively.For cross-project defect prediction tasks,the method proposed in this paper is 7.8%and 10.0%higher than the F1 average of DBN and ISDA,respectively,and has achieved good experimental results.According to the method proposed in this paper,a software defect prediction prototype system based on abstract syntax tree and B/S structure is constructed.For the software project or engineering source code input by the user to create a software defect prediction task,each file is sent to a trained software defect prediction model,and the model outputs a label to determine whether the input file contains defects.The results of all files are then summarized and returned to the system.The system can also view and version control the models required for software defect prediction.The system is complete and easy to use.
Keywords/Search Tags:software defect prediction, abstract syntax tree, semi-supervised learning, pre train
PDF Full Text Request
Related items