Font Size: a A A

Study On Software Defect Prediction Approach Under Different Scenarios

Posted on:2024-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:S G ZhangFull Text:PDF
GTID:1528307118477794Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present,software defect prediction is a research hotspot in the field of software engineering.Most of the research works on software defect prediction use various statistical or machine learning methods to mine and analyze the historical data in software projects,so that the defect modules in the project can timely and effective be found to optimize resource allocation and reduce development costs.As more and more open source project data sets are published,the data sources for training prediction models are increasingly rich.Currently,according to the source of project data used in building and evaluating models,software defect prediction can be divided into four different scenarios: within-version defect prediction,cross-version defect prediction,cross-project defect prediction and cross-company defect prediction.In the four prediction scenarios,the data for within-version defect prediction comes from the same version within the same project,so it is relatively simple to build the prediction model in this scenario.However,the class imbalance problem will have a significant impact on within-version defect prediction.In the three defect prediction scenarios of cross-version,cross-project and cross-company,the training and test data used come from different versions,projects and companies,resulting in differences in data distribution or feature space.Therefore,data distribution difference or feature space heterogeneity will directly affect the performance of the prediction model under these three scenarios.In addition,factors such as data preprocessing,feature quality,and prediction models will also affect the prediction performance.Based on the above analysis,this paper aims to conduct an in-depth analysis of the existing problems under different defect prediction scenarios,and then propose innovative technologies or methods to improve the prediction performance of the model.The specific research contents are as follows:(1)Within-version defect prediction approach based on BiGAN anomaly detection.Considering that most research works study defect prediction from the classification perspective,it inevitably leads to the class imbalance problem.In order to solve the class imbalance problem in the within-version defect prediction scenario,a within-version defect prediction approach based on BiGAN anomaly detection is proposed.First,defect prediction is studied from the perspective of anomaly detection,so that it can avoid the impact of class imbalance problem on prediction performance.In general,the number of defective samples is much smaller than the number of nondefective samples,so the defective samples are regarded as abnormal samples.Secondly,with the powerful generative learning ability of BiGAN,a generative adversarial network,the feature distribution of non-defective samples data is obtained and the prediction model is constructed.Finally,experiments are carried out on 19 open source projects in three datasets.The experimental results show that the proposed method not only outperforms other class imbalance learning methods,but also has higher recall in prediction performance.(2)Cross-version defect prediction approach based on hierarchical feature with deep ensemble learning.Considering that richer high-quality features can better represent projects and build better prediction models,a cross-version defect prediction method based on hierarchical feature with deep ensemble learning is proposed.First,a variety of information sources are obtained and different deep learning models are used to automatically extract valuable features from them;then,all the extracted features are optimized by the filtering mechanism and fused;finally,after using all the obtained features to construct multiple sub-prediction models,the ensemble learning method is used to integrate all sub-prediction models to obtain the final prediction model.Experiments show that the proposed method has better predictive performance in both non-effort-aware and effort-aware contexts.(3)Cross-project defect prediction approach based on hybrid multiple models transfer learning.Considering the complexity of data distribution across projects,it is proposed to use a variety of transfer learning methods from three aspects to more effectively reduce the differences in data distribution between projects.Specifically,the sample,sample eigenvalue and sample weight are considered and the corresponding transfer learning methods are adopted to reduce the differences in data distribution.The class imbalance problem is solved from two aspects: the source project and the target project.Firstly,for the source project,the undersampling method is used to iteratively construct multiple balanced sub-training data,and multiple sub-prediction models are trained.The final prediction model is obtained by integrating all sub-prediction models.Secondly,for the target project,the improved transfer learning method Tr Ada Boost is used to increase the attention of defect samples in the prediction model.The experimental results show that the proposed method can reasonably use various types of transfer learning methods to achieve knowledge transfer between source domain and target domain,and effectively solve the class imbalance problem,thus improving the performance of cross-project defect prediction model.(4)Cross-company defect prediction approach based on feature matching and deep transfer learning.Aiming at the two key problems of inconsistency of project features and differences in project data distribution among different companies in cross-company defect prediction,a cross-company defect prediction approach based on feature matching and deep transfer learning is proposed.The method is divided into two stages:firstly,a new feature unified representation is defined,which takes into account the common and unique features of projects between different companies during the feature matching process;secondly,a deep transfer learning method is used to further reduce the data distribution differences of projects between different companies and build the prediction model.The experimental results show that the proposed method can effectively solve the two key problems in cross-company defect prediction and improve the performance of the prediction model.
Keywords/Search Tags:software defect prediction, different scenarios, machine learning, transfer learning, deep learning
PDF Full Text Request
Related items