Research On Software Defect Prediction Based On Ensemble Learning

Posted on:2022-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2518306779984989

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

Today,with the speedy development of science and technology,with the continuous promotion of social information construction and the wide application of information technology,all the various walks of life cannot leave the support of computer software,and the reliability of software is gradually becoming the focus of attention.The fewer defects in the software,the higher the reliability of the software and the more stable the system.Therefore,software defect prediction is an important research topic.However,in practical research,the lack of sufficient labeled samples often leads to the lack of effective training of prediction models.In addition,there is a serious class imbalance problem in the defective sample set,and the unbalanced data distribution affects the prediction results of the model.At the same time,the sample data contains a large amount of redundant feature information,which will also affect the predictive performance of the model.Therefore,this paper comprehensively considers the problems of insufficient labeled samples,class imbalance and feature redundancy often encountered in software defect prediction,puts forward the following solutions,and selects NASA,AEEEM and MORPH public data sets for experiments:Firstly,aiming at the class imbalance problem of defect data,this paper makes undersampling and smote oversampling on the data set to reduce the imbalance of data.Compared with the original data without data sampling and random undersampling,random oversampling and smote oversampling,it proves the necessity of data resampling and the effectiveness of mixed sampling.Secondly,for the feature redundancy problem,this paper uses the SMA optimization algorithm to select the optimal features.The experimental comparison with the original data without feature selection and the PSO and GWO algorithms proves the conclusion that there are redundant or irrelevant features in the data and the superiority of the SMA optimization algorithm.Finally,in view of the lack of sufficient labeled samples,this paper introduces clustering algorithm based on UDEED algorithm,and proposes an improved software defect prediction method SUDAda Boost based on semi supervised ensemble learning.The results show that SUDAda Boost is not only better than the initial Ada Boost algorithm,but also has good performance in alleviating class imbalance problems.

Keywords/Search Tags:

Software defect prediction, Semi supervised learning, Integrated learning, Data sampling, Feature selection

PDF Full Text Request

Related items

1	Research On Software Defect Prediction Method Based On Semi-supervised Integration
2	Research On Machine Learning Based Software Defect Prediction
3	Research On Software Defect Prediction For Cross-version Software
4	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
5	Research On Software Evolvability Prediction Based On Semi-supervised Data Grouping
6	Software Defect Modeling And Prediction In Resource-constrained Scenarios
7	Feature Extraction Based Software Defect Prediction
8	Research On Software Defect Prediction Method Based On Machine Learning
9	Research On Semi-supervised Learning Based Software Defect Prediction
10	System Study Of Software Defect Mining With Weak Label