Font Size: a A A

Research On Software Defect Prediction Based On Ensemble Learning

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306779984989Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Today,with the speedy development of science and technology,with the continuous promotion of social information construction and the wide application of information technology,all the various walks of life cannot leave the support of computer software,and the reliability of software is gradually becoming the focus of attention.The fewer defects in the software,the higher the reliability of the software and the more stable the system.Therefore,software defect prediction is an important research topic.However,in practical research,the lack of sufficient labeled samples often leads to the lack of effective training of prediction models.In addition,there is a serious class imbalance problem in the defective sample set,and the unbalanced data distribution affects the prediction results of the model.At the same time,the sample data contains a large amount of redundant feature information,which will also affect the predictive performance of the model.Therefore,this paper comprehensively considers the problems of insufficient labeled samples,class imbalance and feature redundancy often encountered in software defect prediction,puts forward the following solutions,and selects NASA,AEEEM and MORPH public data sets for experiments:Firstly,aiming at the class imbalance problem of defect data,this paper makes undersampling and smote oversampling on the data set to reduce the imbalance of data.Compared with the original data without data sampling and random undersampling,random oversampling and smote oversampling,it proves the necessity of data resampling and the effectiveness of mixed sampling.Secondly,for the feature redundancy problem,this paper uses the SMA optimization algorithm to select the optimal features.The experimental comparison with the original data without feature selection and the PSO and GWO algorithms proves the conclusion that there are redundant or irrelevant features in the data and the superiority of the SMA optimization algorithm.Finally,in view of the lack of sufficient labeled samples,this paper introduces clustering algorithm based on UDEED algorithm,and proposes an improved software defect prediction method SUDAda Boost based on semi supervised ensemble learning.The results show that SUDAda Boost is not only better than the initial Ada Boost algorithm,but also has good performance in alleviating class imbalance problems.
Keywords/Search Tags:Software defect prediction, Semi supervised learning, Integrated learning, Data sampling, Feature selection
PDF Full Text Request
Related items