Font Size: a A A

Research On Methods Of Static Software Defect Prediction Based On Block-regularized Cross-validation

Posted on:2020-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiuFull Text:PDF
GTID:2428330578469047Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software defect prediction technology forecasts potential software defects through constructing software defect prediction model with some software metrics.It can locate the defects,allocate resources reasonably,save time and raise software development efficiency.This paper proposes a feature selection method based on MIC in defect count models,which is based on block-regularized m×2 cross-validation;and proposes an ensemble classifier using vote in defect proneness models,then validate this method's effectiveness by experiment.In defect count models,many metrics extracted from static code and aggregated(sum,avg,max,min)from methods into classes can be candidate features,and the classical feature selection methods,such as AIC,BIC,should be processed at a given model.As a result,the selected feature sets are significantly different for various models without a reasonable interpretation.Maximal information coefficient(MIC)presented by David et al is a novel method to measure the degree of the interdependence between two continuous variables,and an available computing method is also given based on the observations.This paper firstly uses the MIC between defect counts and each feature to select features,and then conducts the power transformation on the selected features,and finally builds up the principal component Poisson and Negative Binomial regression defect prediction models.All experiments are performed on NASA KC1 class-level data set with FPA?AAE?ARE as the performance measures.The block-regularized m?2 cross-validated sequential t-test is applied to testing the performances of two models.The experimental results show that 1)the aggregated features of sum,avg,max selected by MIC except min are significantly different from those by AIC?BIC;2)the power transformation to the features can improve the performance for majority of models;3)after PCA and factorial analysis,two clear factors are obtained in the model.One corresponds to the aggregated features via avgand max,and the other corresponds to the aggregated features with sum.Therefore,the model owns a reasonable interpretation.Conclusively,the aggregated features with sum,avg,max are significantly effective for software defect prediction,and the regression model based on the selected features by MIC has some advantages.In defect proneness models,for the given classifiers and data sets,this paper uses block-regularized m×2 cross-validation to partition data,and combines random undersampling strategy in the stage of training,and constructs an ensemble classifier adopting majority voting of m confusion matrixs' s results;In order to validate this ensemble classifier's performance,we conduct the experiment on NASA 4 classification data sets,and apply 7classifiers,4 performance measures(P,R,F1,AUC);The experimental results show that the ensemble classifier's performance gradually tends to stabilize with m increasing,and the ensemble classifier can improve performance significantly especially on the decision tree model.
Keywords/Search Tags:MIC, Defect count, Defect proneness, Block-regularized m×2 cross-validation, Ensemble classifier
PDF Full Text Request
Related items