Aggregated Software Defect Prediction Model Based On Block-regularized Cross-validation

Posted on:2022-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:H J Ding

Full Text:PDF

GTID:2518306509970129

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Conventional software defect prediction（SDP）modeling is typically constructed based on static software metrics coupled with popular machine learning algorithms.Many studies have shown that ensemble learning algorithms possess a promising predictive performance in the SDP task,including bagging,random forest,and etc.However,both the bagging and the random forest adopt a bootstrap sampling procedure in the SDP model construction.The bootstrap would lead to a more skewed label distribution and degrade the predictive performance and the stability of the SDP models attributing to the naturally imbalance distribution of defective labels in the SDP data set.Therefore,in this study,a block-regularized m×2 cross-validation（m×2 BCV）is employed to construct an efficient aggregated SDP model based on a majority voting.On the basis of the 16 data sets of cross version defect prediction（CVDP）and 22 data sets of cross project defect prediction（CPDP）,extensive experiments that are conducted using the basis classifiers of the decision tree and the random forest verify the effectiveness of the proposed SDP model.An m×2 BCV is a regularized version of conventional random m×2 cross-validation,and it constrains the data partitioning with certain regularization conditions.When splitting the SDP data set,an m×2 BCV satisfies an acknowledged partitioning principle:Distribution of a training set should be close to that of the validation set.When splitting an SDP data set,a random under sampling（RUS）strategy is used,and four novel regularization conditions are introduced.The obtained 2m sub sets are used to train multiple basis SDP models.Furthermore,the basis SDP models are finally reduced to the proposed aggregated SDP model via the majority voting.Moreover,the F₁score is used as an evaluation measure of the predictive performance of an SDP model,and a Bayes test with regard to the F₁ score is introduced to compare two different SDP models,and the length of the credible interval of the F₁score is used to measure the stability of an SDP model.Several conclusions obtained from the extensive experimental results are as follows.1)Compared with an ada-boost SDP model and a bagging SDP model,the F1 score of the proposed m×2 BCV aggregated SDP model is improved in the majority of the CVDP and CPDP experiments.Furthermore,the stability of the F₁ score over all experiments is significantly improved attributing to the smaller length of the credible interval of the F₁score.2)An entropy measure is introduced to evaluate the distinguish ability of the important metrics.Experimental results illustrate that the m×2 BCV aggregated SDP model with a basis of a random forest is more stable than the traditional SDP model with an individual random forest.3)A larger-the-better signal-to-noise ratio is used as an optimized objective when tuning the hyper-parameters in an SDP model.Experimental results illustrate that the proposed method is effective in the majority of the CVDP and CPDP experiments.In conclusion,the proposed m×2 BCV aggregated SDP model has certain advantages in the tasks of tuning hyper-parameter,distinguishing important metrics,elevating the F1 score of an SDP model,and improving the stability of an SDP model.The proposed method points out a new direction for the SDP aggregated methodology.

Keywords/Search Tags:

Software defect prediction, Regularized cross-validation, Hyperparameter tuning, Ensemble learning, Stability

PDF Full Text Request

Related items

1	Research On Methods Of Static Software Defect Prediction Based On Block-regularized Cross-validation
2	Research On Block-regularized Cross-Validation Methods For Comparing Supervised Algorithms
3	Model Selection Method Based On Block-regularized Cross-validation
4	Design And Implementation Of Instance Selection Based Ensemble Cross-project Defect Prediction Method
5	Research On Software Defect Prediction Based On Ensemble Learning
6	Software Defect Prediction Based On An Ensemble Model
7	Research On Some Key Technologies Of Software Defect Prediction
8	Research On Software Defect Prediction Model Based On Active Ensemble Learning
9	Search Based Semi-supervised Ensemble Learning Research For Cross-project Defect Prediction
10	Research On Cross-Project Software Defect Prediction Based On Multi-Source Transfer Learning