Font Size: a A A

Research On Software Defect Prediction Based On Unbalanced Ensemble Classification

Posted on:2022-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2518306338993769Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of advanced technologies such as cloud computing and artificial intelligence,application scenarios are becoming more abundant,such as telecommunications,commerce,and transportation.In the transportation,medical and health industries,the size of various system software and application software related to it has increased exponentially.According to statistics,various data have increased by ten times every five years.Software is the core force leading scientific and technological innovation.Having a strong software industry is one of the keys to achieving national scientific and technological self-reliance.In the process of software development,if the relevant requirements of the forecast and regulations are not met,system errors or crashes will result.Such problems that affect the smooth progress of the software or programs are called software defects.If software defects are not discovered and corrected in time,their gradual accumulation and transmission will affect the reliability and stability of the software.Software defect prediction has become an important research direction,with great research and practical value.The industry and academia use machine learning and data mining technology to replace time-consuming and labor-intensive traditional detection methods to solve the problem of software defect prediction and improve prediction performance.At present,software defect prediction has two typical problems: class imbalance and the effect of defect prediction is not good.This paper conducts a systematic study on these two problems,improves the SMOTE algorithm from the data level to alleviate the class imbalance problem,and combines LightGBM with parameter optimization integrated algorithms to improve the performance of software defects prediction,the main results of this research are as follows:Firstly,the performance of three classical oversampling algorithms and three ensemble algorithms for software defect prediction are compared.The three classical oversampling algorithms are: SMOTE algorithm?Borderline-SMOTE algorithm?ADASYN algorithm;three ensemble algorithms: Random Forest algorithm?XGBoost algorithm?LightGBM algorithm.Through the effective combination of 9 methods and experiments on 10 sets of NASA data sets,it is proved that oversampling methods are helpful to improve the classification performance of the classifier.Among the three integrated algorithms,the LightGBM algorithm has the best performance and the shortest time-consuming.Secondly,it is difficult to establish a classification model for the internal distribution of unbalanced data in software defects.This paper starts from the data level,the most widely used smote oversampling method is improved.By identifying and eliminating noise samples in time,and dynamically adjusting the nearest neighbor parameters of SMOTE algorithm,the combined minority data can retain the characteristics of the original distribution.In three different classifiers,the proposed Adan is proved by cross validation with 10 imbalanced data sets on KEEL,the proposed AdaN?SMOTE algorithm is superior to other traditional oversampling algorithms and achieves better Accuracy,Recall,AUC and F1 values.Finally,aiming at the poor prediction performance of software defect prediction,by analyzing the super parameter characteristics of LightGBM algorithm,three parameters which have the greatest impact on the experiment are selected: num?leaves?max?depth?feature?fraction.The final super parameter value is confirmed by 5 fold cross test.Compared with the traditional grid search time,the tuning time is greatly reduced.In order to further verify the proposed AdaN?SMOTE algorithm and LightGBM algorithm are efficient.They are applied to the typical NASA data set in the field of software defect prediction.The comparison of the combined algorithm results with other oversampling and integration algorithms proves the efficiency of the proposed algorithm and obtains better defect prediction performance.
Keywords/Search Tags:Software defect prediction, AdaN?SMOTE algorithm, unbalanced data, LightGBM
PDF Full Text Request
Related items