Font Size: a A A

Design And Tool Implementation Of Cross-project Software Defect Prediction Method Via Active Transfer Learning

Posted on:2021-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z D YuanFull Text:PDF
GTID:2518306482985919Subject:Computer technology
Abstract/Summary:PDF Full Text Request
During software development,software defect prediction methods are often used to predict the suspicious defect modules in the project.The traditional software defect prediction mainly constructs the defect prediction model by extracting the metrics of software modules and marking them.When facing a new project,although the software modules of the new project are very easy to measure,the cost of marking them is very high.The current mainstream method solving this problem is to use existing modules of other projects to build defect prediction model.Then,the model is used to predict software modules of new projects.The process is called cross-project software defect prediction.However,due to the differences in programming languages,application fields,and development processes for different projects,the data distribution of other projects and the data distribution of new projects will vary greatly.This difference will cause the poor performance of the trained defect prediction model.The existing cross-project defect prediction methods mainly use transfer learning to solve this problem.However,using transfer learning methods individually has very limited performance improvements.Recently,some researchers have proposed using active learning methods to improve the performance of within-project and cross-version defect prediction.As far as we know,this article combines active learning and transfer learning in the cross-project defect prediction for the first time.The purpose of this method is to better solve the problem that the model has poor performance.We design the ALTRA method.In order to minimize differences between the distribution of other project and the distribution of the new project,we firstly use Burak filter to select similar labeled modules from other project after analyzing the unlabeled modules in the new project.Burak filter mainly uses the K nearest neighbor method.Then we use active learning to choose representative unlabeled modules from the new project and ask experts to label the type(i.e.,defective or non-defective)of these modules.Later,we use Tr Ada Boost to determine the weights of labeled modules in the other project and the new project,and then construct the model via weighted support vector machine.The active transfer learning process goes back and forth,stopping when 5% of the total modules in the new project are selected.We terminate this method and return the final constructed model.In addition,related prototype tools are implemented based on this method.In order to show the effectiveness of our proposed method ALTRA,we choose 10 large-scale open-source projects from different application domains.We compare ALTRA with 7 state-of-the art cross-project defect prediction baselines.For measuring the performance of the models,F1 and AUC are used as performance indicators.The results of empirical study show that ALTRA can perform significantly better than other cross-project defect prediction baselines.Moreover,we also conduct empirical studies to verify the rationality of the steps of ALTRA.The results show that the usage of Burak filter,the uncertainty active learning strategy,the class imbalanced learning method and Tr Ada Boost are competitive in our proposed method ALTRA.
Keywords/Search Tags:Cross-project software defect prediction, Active learning, Transfer learning, Class imbalance learning
PDF Full Text Request
Related items