| With the rapid development of the internet,various kinds of software come out one after another.These software make life more convenient,but they also become more and more complex,and more prone to some defects.Software defect prediction can help developers and testers to find the problem early and ensure the reliable operation of the software system.However,there are two problems in software defect prediction,the first is the class imbalance of software defect data,the second is the high-dimensional feature of software defect data.These two problems affect the performance of software defect prediction.In order to solve these problems,this thesis proposes a software defect prediction method based on adaptive synthetic sampling and denoising autoencoder.The main contents of this thesis are as follows.Firstly,to solve the class imbalance problem of defective data sets,this thesis proposes an adaptive synthetic sampling method based on genetic algorithm.By calculating the proportion of the majority samples around the minority samples as the weight,the adaptive synthetic sampling can synthesize more new samples for the hard-to-learn samples in the minority classes,thus,the decision boundary is shifted to the hard-to-learn samples to reduce the bias caused by unbalanced learning.Then,using the individual evolution method in genetic algorithm,the selected samples are crossed and mutated into adaptive synthetic sampling to generate new samples,so as to balance the data set.Secondly,for the problem of feature high dimension,this thesis proposes a feature representation based on denoising autoencoder,which is realized by neural network.Firstly,the data is corrupted by noise,and then the corrupted data is input into the neural network,and it is required to reconstruct the original input through encoding and decoding.In this process,noise is introduced to the data to force the neural network to learn more robust coding and improve the generalization ability of the model.Thus,when the original data set is input,the hidden layer of the autoencoder can get more representation of the nature of the data,and solve the problem of feature high-dimension without reducing the number of data features.Finally,based on the adaptive synthetic sampling and denoising autoencoder,the support vector machine is selected as the classifier to construct the software defect prediction model.The validity of this method is verified by using the NASA MDP data set,and compared with other researchers proposed methods,and the experimental results are analyzed. |