Research On Software Defect Prediction Model For High Dimensional And Imbalanced Data

Posted on:2021-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2568306104971229

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Software defect prediction technology can effectively assist software testing to guarantee software quality.However,Class imbalance can make the model pay more attention to the non-defective modules and insufficient training of the defective modules,resulting in a greatly reduced classification performance of the defective modules.A large number of irrelevant and redundant features can reduce the prediction accuracy.In addition,a single classifier is difficult for the prediction of defect data with diverse distribution.The main contents are as follows:First of all,aiming at the problem of class imbalance of software defect data,an ADASYNTomek combined sampling algorithm is proposed.Adaptive method is used to focus on samples that are difficult to learn from minority class and the TomeLink method is used to ensure that the data set is balanced while reducing noise samples and improving data quality.Secondly,aiming at the problem of high dimensionality of data,a deep feature selection algorithm based on comprehensive sorting and cross-recursion elimination(CR-RFECV)is proposed.Comprehensively analyze the correlation between features and classes through information gain rate and chi-square value to eliminate irrelevant features,use Spearman correlation coefficient to analyze redundancy between features to remove highly redundant features,and the cross recursive feature elimination method of ridge regression is used to make a deeper selection.In this way,the problems of poor generalization ability of single feature selection and insufficient stability of the method can be solved,and the calculation accuracy can be improved while ensuring rapid dimensionality reduction.Moreover,because the model built by a single classifier is not comprehensive enough to predict the distributed software defect data,it is necessary to integrate multiple base classifiers for improvement.Therefore,an ATW-Bagging ensemble classification algorithm is proposed.The algorithm considers from both the training and decision stages.In the training stage,the diversity of data distribution is introduced while all samples are considered comprehensively,and ADASYNTomek method is used to balance training subsets with different imbalanced rates.In the decision stage,different base classifiers are selected to increase the diversity of base classifiers,and weighted integration is performed based on the cost of misclassification.When constructing a software defect prediction model,the data is preprocessed briefly and the CR-RFECV algorithm is used to reduce the dimensionality,and then the ATW-Bagging ensemble classification algorithm is used to predict the software module,and the final prediction class is obtained.Finally,The CR-RFECV algorithm is compared with other dimension reduction methods.The ATW-Bagging ensemble classification algorithm is compared with the single classification algorithm,the traditional Bagging algorithm and current newer software defect prediction algorithm to verify its effectiveness.

Keywords/Search Tags:

software defect prediction, high dimensionality, class imbalance, feature selection, combined sampling, ensemble learning

PDF Full Text Request

Related items

1	Research On Software Defect Prediction Method Based On Fusion Feature Selection And Ensemble Learning
2	Research On High-dimensional Data Processing In Software Defect Prediction
3	Researches And Applies On Software Defect Prediction Method Based On Ensemble Learning
4	Based On Ensemble Sampling And Data Imbalance Self-adaptive Processing Method In Defect Prediction Context
5	Research On Software Defect Prediction Method Based On Ensemble Learning And Multi-hierarchical LSTM Feature Fusion
6	Research On Software Defect Prediction Method Based On Feature Selection And Oversampling
7	Research On Software Defect Prediction Model Based On Active Ensemble Learning
8	Research On Software Defect Prediction Based On Ensemble Learning
9	Research On Software Defect Prediction Based On Ensemble Learning
10	Research On Prediction Method Of Software Defect Quantity Based On Machine Learning