Font Size: a A A

Research On Unbalanced Data Classification Algorithm In Software Defect Prediction

Posted on:2022-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z D RaoFull Text:PDF
GTID:2518306749458194Subject:Art
Abstract/Summary:PDF Full Text Request
With the development of information society and the progress of science and technology,people rely more and more on software systems,and the reliability guarantee of software quality is more and more important.Software defect prediction as one of the important means of testing modules,can through the warehouse on history of digging,describing the related measurement information form data set,data sets by using the methods of machine learning to learn,to build effective prediction models,to test the module contains high defects,optimizing the allocation of test resources.However,in reality,there are fewer data samples with defects,which leads to the preference of dividing test data into most classes after machine learning training,which seriously affects the classification performance of the model in software defect prediction.Therefore,it is necessary to adopt some methods to alleviate the class imbalance.The main work of this paper includes:(1)At the data pre-processing level,in order to make use of the classification characteristics distributed around the defect data to accurately oversampling and form a reasonable and balanced data set,A novel AJCC-Ram Adaptive Judgment Cure Clustering Random Sampling method is proposed.This method is based on improved ADASYN adaptive oversampling and cure-smote oversampling in class edge and class center,respectively.Then,CLNI is used for noise filtering and clearing of balanced data set.Various sampling methods were analyzed in the Common data set by AEEEM and NASA for software defect prediction in the Naive Bayes classifier.The experimental evaluation index F1 results show that the AJCC-Ram method can obtain more stable and efficient prediction results.(2)In terms of algorithm classification,XGBoost(e Xtreme Gradient Boosting)ensemble learner is theoretically studied in order to further improve the classification effect of classifier in class-unbalanced data,and appropriate parameter tuning is carried out under data set AEEEM and NASA.The XGBoost integrated learner is compared with various machine learning classifiers,and the effectiveness of the XGBoost integrated learner is proved.(3)In the model construction stage,based on the above research,this paper establishes a software defect prediction model XG-AJCC(AJCC-RAM +XGBoost)for unbalanced data based on oversampling and ensemble learning.The application of AEEEM and NASA data sets to unbalanced data processing with multiple sample integrations is compared.The evaluation index F1 value shows that the XG-AJCC prediction model can effectively reduce the influence of data imbalance on software defect prediction,so as to obtain more stable and efficient prediction results.
Keywords/Search Tags:software defect prediction, Class imbalance, Oversampling, XGBoost
PDF Full Text Request
Related items