Font Size: a A A

Research On Software Defect Prediction Based On Hybrid Sampling And Integrated Learning

Posted on:2022-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:L L YanFull Text:PDF
GTID:2518306548499804Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age,the application of software continues to expand,all walks of life to the software quality requirements are also constantly improving.The level of software quality is closely related to the defects in the software,the existence of software defects makes the security and reliability of the software face serious threats.If you wait until the software has been put into use to discover the defects,it may bring irreparable losses.Therefore,how to quickly and accurately predict the defects in the software becomes particularly important.In recent years,the research of software defect prediction based on machine learning has attracted widespread attention.Many scholars have transformed software defect prediction problems into classification problems in machine learning,and proposed a series of defect prediction methods based on machine learning.However,the existing methods still face the following problems in practical application:imbalanced data categories(the number of defective samples is usually far less than that of non-defective samples),low prediction accuracy,and a large number of small sample data,etc.How to effectively solve the above problems has become a research hotspot in relevant fields.This thesis systematically studies the problems of software defect prediction from the perspective of data sampling and Stacking integrated learning.First,for the problem of imbalanced defect data categories,we compare the effects of different sampling methods on the performance of software defect prediction model based on Stacking;Second,we combined the data sampling technology with the Stacking method,and proposed a method based on hybrid sampling and Random?Stacking;Third,in order to increase the diversity of base classifiers,a Stacking method based on attribute space disturbance is proposed and used to predict software defects.The main work of this thesis can be summarized as follows:(1)Research on the Influence of Different Sampling Methods on the Performance of Software Defect Prediction Model Based on StackingTo deal with the problem of category imbalance in defect history data,it is studies how to combine the data sampling method with the software defect prediction model(Stacking model for short)based on Stacking,so as to provide an effective solution for the category imbalance problem.Focus on the four sampling methods(ie Borderline-SMOTE+Tomek Links,SMOTE,Borderline-SMOTE and ADASYN)on the performance of the Stacking model,respectively combine the above four sampling methods with the Stacking model,and the best combination will be obtained by comparing each combination of defect prediction performance.Experiments on multiple NASA MDP datasets and Promise datasets show that combining the Borderline-SMOTE+Tomek Links sampling method with the Stacking model can provide the best solution to the problem of category imbalance.(2)Research on Software Defect Prediction Based on Hybrid Sampling and Random?StackingAiming at the problem of imbalanced categories and low prediction accuracy in software defect prediction,a software defect prediction algorithm DP_HSRS based on hybrid sampling and Random?Stacking is proposed.The DP_HSRS algorithm first uses the Borderline-SMOTE+Tomek Links hybrid sampling method to balance the unbalanced data,and then uses the Random?Stacking algorithm to predict software defects on the balanced data set.The Random?Stacking algorithm is an effective improvement of the traditional Stacking algorithm.It constructs multiple stacking classifiers by fusing multiple classic classification algorithms and the Bagging mechanism,voting multiple stacking classifiers to obtain an integrated classifier.and finally using the integrated classifier predicts software defects.Experiments on multiple NASA MDP data sets and Promise data sets show that the performance of the DP_HSRS algorithm is better than existing algorithms and has better defect prediction performance.(3)Stacking Algorithm Based on Attribute Space Disturbance and its Application in Software Defect PredictionEnsemble learning is widely used in software defect prediction,but the existing methods still have the following problems:the diversity of base classifiers is difficult to guarantee,and the prediction accuracy is low.To solve the above problems,this thesis proposes a Stacking algorithm ASP_Stacking based on the perturbation of the attribute space,and uses it to predict software defects.As an effective improvement of the traditional Stacking algorithm,ASP_Stacking algorithm first uses hybrid sampling technology to process category imbalanced data;Then,multiple attribute subspaces are generated on the balanced data,and the Stacking algorithm is constructed by using Stacking algorithm in each subspace;Finally,these Stacking classifiers are integrated to predict defects.Experiments on multiple NASA MDP data sets show that increasing the diversity of base classifiers by perturbing the attribute space can improve the performance of defect prediction.
Keywords/Search Tags:software defect prediction, imbalanced data, hybrid sampling, ensemble learning, Random?Stacking, attribute space disturbance, ASP?Stacking
PDF Full Text Request
Related items