Research On Sampling Integration Algorithm Of Unbalanced Data In Software Defect Prediction

Posted on:2024-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Jia

Full Text:PDF

GTID:2568306917965459

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Software defect prediction technology has an important position in the software life cycle,and accurately locating the module where the software defect is located is conducive to improving software quality and saving software testing costs.Many scholars have transformed software defect propensity prediction into machine learning binary classification.Also,they proposed a series of effective defect prediction methods.However,there are still the following problems of existing methods in the practical application: unbalanced data,unclear classification boundaries and low model prediction accuracy,etc.,and how to solve these problems has become a hotspot in related fields.This paper carries out the modeling research of unbalanced data in software defect prediction from the data level and algorithm level,and the main work is as follows:(1)Aiming at the complexity of the distribution of class unbalanced data and the problems of sample overlap and unclear boundary after oversampling,a local density based BIRCH clustering adaptive oversampling with filtering algorithm F-LDBS(Local density based BIRCH clustering with filtering)is proposed.Class unbalanced data processing stage LDBS: The concept of local density of samples is introduced,and new samples are sampled according to the interpolation of subcluster density after clustering of defective samples,so that the new defect samples are scattered and distributed in the space of the defect dataset,and at the same time adapt to the imbalance within and between classes.CLOR(Closest List Overlapping Data Remove): A recent list class overlapping data cleaning algorithm based on domain search is proposed,which weighs the sensitivity and specificity of samples,uses proximity search technology to accurately identify overlapping area samples,and improves the problem of classification boundary ambiguity.In AEEEM and NASA,which are commonly used data sets for software defect prediction,the decision tree classifier is used to compare several oversampling methods to verify the effectiveness of LDBS oversampling algorithm and F-LDBS oversampling algorithm.(2)Aiming at the problem that traditional classification learning algorithms tend to ignore a few class samples when predicting class imbalanced datasets,resulting in high deviation of prediction models,Cat Boost ensemble learning is theoretically studied,and the grid search method is used to find the optimal parameters of the dataset AEEEM and NASA,and the parameterized Cat Boost ensemble learner is experimentally compared with various commonly used machine learning classifiers.The applicability and efficiency of Cat Boost integrated learner in software defect prediction are proved.(3)In the model construction stage,this paper builds the unbalanced data sampling integrated software defect prediction model Cat-LDBst based on the above research to maximize the class imbalance problem and improve the performance of the software defect prediction model,combined with F-LDBS oversampling and Cat Boost ensemble learning.Compared with the application of multiple sampling ensemble algorithms in unbalanced data processing,the superiority of Cat-LDBst prediction model and the rationality and feasibility of the main ideas of the model are verified.

Keywords/Search Tags:

software defect prediction, class imbalance, oversampling, CatBoost

PDF Full Text Request

Related items

1	Research On Unbalanced Data Classification Algorithm In Software Defect Prediction
2	Research On Software Defect Prediction Method Based On Feature Selection And Oversampling
3	Research And Implementation Of Software Defect Prediction Model Construction And Sharing Methods
4	Software Defect Prediction Strategy Design For Imbalanced Data
5	Wide Research Of Data Mining With Machine Learning On Software Defect Prediction
6	Research On High-dimensional Data Processing In Software Defect Prediction
7	Research On Software Defect Prediction Technology For Few-sample Data
8	Research On Data Preprocessing Technology In Cross Project Software Defect Prediction
9	Research On Software Defect Prediction Model For High Dimensional And Imbalanced Data
10	Research On Software Defect Prediction Based On Learning Mechanism