Font Size: a A A

Research On Prediction Method Of Software Defect Quantity Based On Machine Learning

Posted on:2023-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:L Y LiuFull Text:PDF
GTID:2568307025453374Subject:Project management
Abstract/Summary:PDF Full Text Request
As software has gradually become an important force to promote social development,ensuring software quality has become the most important work in software development and application.Software quality management can effectively improve the reliability and durability of software.Software defect prediction technology is a common technical means of software quality management.It can early identify software modules with defects,guide software testers to reasonably allocate resources,timely deal with software defects,and ensure software quality.Software defect prediction technology mainly extracts defect feature information from software historical data and constructs software defect prediction models based on statistics or machine learning algorithms to predict the defects of software modules.The prediction method of software defect quantity based on machine learning has become one of the hot spots in the current defect prediction field.However,in the process of defect quantity prediction,there are still some problems that affect the accuracy of defect prediction,such as data set feature redundancy,data class imbalance,and data distribution differences between different items.Therefore,based on the existing research on software defect quantity prediction methods based on machine learning algorithms,this thesis studies the feature selection method,class imbalance processing method,and cross-project software defect quantity prediction method in the defect quantity prediction process,and proposes corresponding solutions.1.Aiming at the redundancy or irrelevant features existing in the current software defect number prediction process,in this thesis,with the improved peak density clustering algorithm with Pearson correlation coefficient is put forward based on the characteristics of clustering and correlation of IDPCP(Improved Density Peak Clustering and Pearson)feature selection method.First,the Gini coefficient was introduced to improve the density peak clustering algorithm,and the improved density peak clustering algorithm was used to cluster the defect features to eliminate the influence of human factors on the clustering process,and the features with high correlation were clustered in the same cluster;after that,the Pearson correlation coefficient between each feature and defect number category in each cluster after clustering was calculated to sort out the optimal feature subset.The experimental results show that the proposed method can effectively eliminate some redundant or irrelevant features in the feature data,improve the quality of software defect data sets,and then improve the accuracy of software defect number prediction.2.Aiming at the class imbalance problem in the current software defect quantity prediction process,this thesis proposes an HS-Stacking software defect quantity prediction method based on hybrid sampling and integrated learning.First,the boundary information and weight relation are introduced to improve the SMOTE algorithm,so that the improved SMOTE oversampling method can be applied to the data oversampling in the defect number prediction process.Then,the improved SMOTE oversampling and random down-sampling hybrid sampling method are used to balance the defect data set;after that,the Stacking integration method is used to integrate Linear Regression,Support Vector Regression,Bayesian Ridge Regression,and Decision Tree Regression algorithm to build the Stacking integration learning model and complete the prediction of the number of software defects.The experimental results show that the method not only solves the class imbalance problem in the process of defect number prediction,but also avoids the overfitting of the prediction process,and effectively improves the accuracy of software defect number prediction.3.Aiming at the lack of sufficient historical data for new projects and the differences in data distribution between different projects,this thesis proposes a method for predicting the number of software defects across projects based on deep learning.This thesis mainly combines transfer learning with deep learning framework,considers the difference of data feature distribution between different projects,introduces the maximum mean difference distance in the transfer deep learning framework,and proposes a cross-project software defect quantity prediction method based on Transfer Stacked Denoising Auto-Encoders(TSDAEs).This method takes the traditional software metrics as the network input and extracts the transferable deep-level features between the source project and target project based on traditional software metrics by simultaneously minimizing the reconstruction error of the source project and target project feature data and the data distribution difference between different projects,and complete cross-project software defect quantity prediction based on transferable deep-level features.The experimental results show that the method can effectively extract the transferable deep-level features between different projects,eliminate the influence of data distribution differences between different projects on the prediction process,and improve the accuracy of cross-project defect quantity prediction.
Keywords/Search Tags:Machine Learning, Defect Number Prediction, Feature Selection, Class Imbalance, Transfer Learning
PDF Full Text Request
Related items