Font Size: a A A

Research On Software Defect Prediction Model Based On Active Ensemble Learning

Posted on:2022-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WangFull Text:PDF
GTID:2518306536996749Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Software defect prediction technology plays an important role in the software life cycle.It can accurately capture the defect and its specific number.Locating the specific module where the defect is is beneficial to improving the software quality.Ensure software performance;Save software testing cost.Module has defects,however,only a very small proportion,can lead to defects in the process of training model of samples are difficult to catch,and high dimension redundancy feature may also affect the final classification accuracy,moreover,had just finished the development of the software data are imperfect in various aspects,especially the sample tag,not accurate prediction model is established.Therefore,this paper has carried out in-depth research and analysis on the above issues.First,data imbalance problem was proposed based on SMOTE-ENN data processing method,the abnormal value of the data set for testing and processing,further by adopting the idea of combination sampling method,in a balanced data at the same time,according to the sample K neighbor categories overlap of data cleaning,thus better able to classify.Secondly,in the face of high dimensional redundant features,a feature combination method of SELECTK-GBDT was proposed.Firstly,feature selection was performed according to the correlation between features and software defects through the chi-square test,and then key features were combined based on decision tree to better express data.Thirdly,a software defect prediction model based on active ensemble learning is proposed to solve the problem of scarce historical data of new development projects.After the initial random selection of a small number of samples,the samples are sorted according to the information entropy,and the samples with the largest amount of information are labeled and the sample pool is updated.Boosting integrated learning model is adopted in the process of active learning.The whole process goes through multiple iterations,and the model using only a small amount of training samples can achieve a higher prediction accuracy.Finally,this paper conducts comparative experiments on open data sets to verify the validity and generalization ability of the data processing method based on SMOTEENN,the feature combination method of SELECTK-GBDT,and the final software defect prediction model based on active ensemble learning respectively.
Keywords/Search Tags:Software defect prediction, Class imbalance, Feature combination, Active learning, Ensemble learning
PDF Full Text Request
Related items