Font Size: a A A

Research On Software Defect Prediction Method Based On Machine Learning

Posted on:2020-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:P ShenFull Text:PDF
GTID:2428330599456772Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of information,software has become an indispensable part of people's production and life.With the rapid increase in software requirements,the scale and complexity of software development continues to increase.Because software engineers' development capabilities are not necessarily able to perfectly meet the needs of software development,software defects are inevitable.Hidden software flaws can cause incalculable losses to people's production and lives,and these unbearable consequences make people pay attention to the importance of software quality.Software testing can effectively detect defects in software products,but due to the impact of software project time cost,labor cost and functional complexity,software test engineers can not consider all aspects of software modules,and can not completely cover software modules.The software defect prediction technology combines the popular technology such as machine learning,and uses the historical metric data of the software product to construct the prediction model,and then predicts the defective module in the software.The technology has effectively supplemented the software testing technology.This thesis mainly analyzes the characteristics of software defect prediction from the perspective of machine learning,and proposes corresponding solutions to the problems existing in software defect prediction research in current practical applications.The main problems are as follows:(1)There are partially redundant or unrelated features in the defect data set,which seriously affect the performance of the defect prediction model;(2)The class imbalance existing in the software defect data and the data set are not completely class tagged.In reality,the proportion of positive and negative samples of software defect data is quite different,and not all data have class marks,so traditional supervised learning can not meet the needs of predictive model construction;(3)new project lacks enough The historical defect data is used to train the prediction model,so it is impossible to construct and verify the defect prediction model using independent and identically distributed training sets and test sets;(4)In the future,the traditional stand-alone model can no longer meet the needs of storage and calculation of large-mode software defect data.The specific measures for the study are as follows:(1)Aiming at the problem of partially redundant or unrelated features in the defect data set,this paper proposes a stable feature selection method(RRSFS)based on correlation and redundancy.Under the k-fold cross-Validation,RRSFS performs two-stage multi-algorithm fusion to select the optimal subset based on the redundancy between features and features and the correlation between features and classes.The RRSFS algorithm not only reduces the data operation in the modeling process.Cost also enhances the stability of the feature selection algorithm.(2)Aiming at the class imbalance in software defect data and the fact that the data set is not completely class tagged,this paper proposes a semi-supervised software defect prediction method(SISDP)based on sampling and integration.SISDP firstly constructs a robust KNN marking model by taking a balanced sample of samples to mark a batch of unmarked data,and then iteratively adds the new marking data to the original data set for the next marking model.Build until the marking is complete.For the marked data set,the training set is obtained by hybrid sampling algorithm,and the integrated classification model composed of multi-classification algorithm is classified and trained.SISDP not only reduces the interference of a few classes on the marking process,but also improves the generalization ability of the defect prediction model.(3)The problem of training the prediction model for the lack of sufficient historical defect data for the new project,this paper mainly proposes a transfer learning algorithm based on convolutional neural network(CNNTL)for defect prediction.The method divides the migration learning process into two types of tasks: A and B.Firstly,the feature dimension of the source project data set in the A task is upgraded and input into the network for preliminary training.The convolution layer weight parameter obtained from the source project data set in the training A task is applied to the B task target project data.The collection of convolutional layer training,thus achieving migration learning.The CNNTL algorithm not only has strong feature migration ability,but also has short training time and good prediction performance.(4)For the storage and computation of large-scale software defect data,this paper proposes a distributed defect prediction algorithm based on neural network(NNDDP).The method is based on computer cluster research.The defect data to be processed is stored in the Hadoop distributed file system HDFS,and then the data is preprocessed to obtain the training set and test set,and the data is divided and distributed to multiple servers for synchronization.Parallel training,finally the local parameters are summarized by the parameter server,and a global defect prediction model is trained.The NNDDP algorithm not only can handle large-scale data sets,but also has good predictive performanceIn order to explore the feasibility of the proposed algorithm,this paper carries out corresponding contrast experiments for each algorithm.At the same time,in order to explore the stability of the proposed algorithm,this paper carried out multi-batch experiments on different data sets.The experimental results show that the proposed algorithm has better performance for the processing of software defect prediction data sets.The research in this paper provides a new research idea for software defect prediction,which is of great significance for improving software quality and software reliability.
Keywords/Search Tags:Machine Learning, Defect Prediction, Feature Selection, Sampling and Integration, Transfer Learning, Parallel Training
PDF Full Text Request
Related items