Font Size: a A A

Design And Implementation Of Instance Selection Based Ensemble Cross-project Defect Prediction Method

Posted on:2018-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:L P WangFull Text:PDF
GTID:2428330569495354Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Software defect prediction is one of the most important research areas in the field of software engineering data mining.The software defect prediction aims to mine the software historical repositories in the first step.Then,by analyzing the software code or the development process,we design a set of metrics which are related to the software defects.In the actual software development process,the project that needs to be predicted(i.e,the target project)may be a new project,or the project has less training data.Therefore,the problem of how to effectively transfer the knowledge of the source project to construct the defect prediction model for the target project is called the problem of cross-project defect prediction.In this thesis,we focus on the problem of defect prediction of the heterogeneous project,which assumes that the same metric is used for both the source and the target projects.We propose a Box-Cox transformation based ensemble learning approach named BCEL.This method mainly includes four stages: In the first stage,different formulas of the distance(including Euclidean distance,cosine similarity,correlation coefficient)are used for the instance selection to get different training sets from the candidate set;In the second stage,Box-Cox transformation is used to perform on these data sets for metric value normalization;In the third stage,a specific classification method(i.e.,Logistic Regression)is used to construct different base classifiers and analysis on whether prediction results diversity is performed;In the fourth stage,if the prediction result is diversity,by utilizing ensemble learning to further improve the prediction performance of the model.In the empirical study,the thesis mainly uses the AEEEM data set to evaluate the performance of the model using the F-measure metric.We choose three different baseline methods based on only a specific distance measurement.In particular,ED demotes the method of instance selection based on Euclidean distance;CS demotes the method of instance selection based on cosine similarity;CC demotes the method of instance selection based on the method of correlation coefficient.The experimental results show that the BCEL method can provide better prediction performance for cross-project defect prediction.On the crossproject defect prediction,the BCEL method is improved by 35.9% compared with the ED method,the BCEL method is improved by 20.5% compared with the CS method and the BCEL method is improved by 24% compared with the CC method.In addition,a prototype tool is designed and implemented by incorporating the ensemble cross-project defect prediction framework BCEL.
Keywords/Search Tags:software defect prediction, cross-project defect prediction, instance selection, ensemble learning, empirical study
PDF Full Text Request
Related items