Font Size: a A A

Research Of Software Defect Prediction Based On Abstract Syntax Tree Encoding

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y CaiFull Text:PDF
GTID:2518306464982869Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the amount of computer software has exploded,and the complexity and scale have also increased.In this context,the probability of software defects is increasing,and the harm cause by software reliability and quality problems also increased.Software defect prediction,as a mean to ensure the quality of software,is used to speculate whether the code is defective.Therefore,using software defect prediction technology can not only reduce potential defects in software development,but also help testers focus on the code snippet that is more likely to have defects.In recent years,lots of researcher study combining software defect prediction with artificial intelligence.The characteristics of this type of researches are using the powerful feature extraction capabilities of artificial intelligence algorithms to capture the semantic features hidden in the program code.The mainstream process can be summarized as converting the program code into an abstract syntax tree and feeding it into a neural network to search defects.However,most software defect prediction frameworks combined with artificial intelligence currently feed an abstract syntax tree into a neural network,give different nodes on the abstract syntax tree a unique real number encoding,traverse the abstract syntax tree into a vector representation and then feed it to the neural network.The method above makes the value comparison between the codes meaningless,which results in the immeasurable semantic distance between the sample programs,to affect the effect of model training.Therefore,this paper proposes a method that combines the characteristics of the abstract syntax tree and converts the nodes of the abstract syntax tree into a vector representation to solve the problem of immeasurable semantic distance between abstract syntax trees.The main work of this paper includes the following three parts:(1)Proposing a coding method oriented to abstract syntax tree,Tree-based Embedding.This coding method does unsupervised training on the tree structure of the abstract syntax tree and represents nodes on the abstract syntax tree as vectors.The Euclidean distance between the vectors corresponding to the nodes is the semantic distance between the nodes.(2)Verifying the effectiveness of Treebased Embedding in within-project and cross-project defect prediction.Convolutional neural network is introduced as a supervised training model for software defect prediction,and transfer learning is introduced to solve the problem of large feature differences between source and target items encountered in cross-project defect prediction.(3)Designing and implementing a software defect prediction system for Tree-based Embedding.
Keywords/Search Tags:Software Defect Prediction, Abstract Syntax Tree, Continuous Bag of Word, Transfer Learning, Convolutional Neural Network
PDF Full Text Request
Related items