Font Size: a A A

Researches On Software Defect Prediction Methods Under Different Scenarios

Posted on:2020-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z XuFull Text:PDF
GTID:1368330620452209Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Software products have been integrated into every aspects of our daily lives.However,due to various factors in the process of software design,development and configuration,the defects are inevitable in the software.The defects hidden in software modules threaten the security and decrease the reliability of the software products.Therefore,it is essential to detect and fix the defective modules before delivering the products.However,due to the continuous growth of the software scale and complexity,it is an increasingly challenging task for software developers and testers to improve the software quality.As the limited testing resources are usually unaffordable for supporting thorough code reviews,this requests a prioritization to better analyze the software product.In other words,developers and testers should reasonably allocate the limited resources to test the modules that have a high probability to contain defects.To seek for such prioritization,researchers propose software defect prediction to identify such high-risk modules for priority inspection.The most widely studied defect prediction methods are supervised models which first train a classification model on labeled software modules and then use it to determine whether or not the unlabeled modules contain defects.The supervised models need the labeled modules of historical data of the current project or external projects as the training set.According to the different sources of the training set,supervised defect prediction can be divided into within version defect prediction scenario,cross version defect prediction scenario and cross project defect prediction scenario.In the three kinds of scenarios,the training set comes from the same version of a project,the previous version of a project and other external projects,respectively.This paper mainly studies the new technologies based on machine learning to solve the different problems of the three kinds of defect prediction scenarios,aiming to further improve the performance of defect prediction.The research contents are described as follows:(1)In order to learn more discriminative feature representation and solve the inherent class imbalance problem of defect data,this paper proposes a within version defect prediction framework which combines kernel principal component analysis and weighted extreme learning machine.The framework firstly maps the training set and test set into a high-dimensional feature space separately by using the kernel principal component analysis method.The feature mapping makes it easy to distinguish the modules which are linearly inseparable in the original feature space.Then the framework uses the mapped training set to construct a classification model based on a weighted extreme learning machine to predict the labels of the mapped test set.This classification model solves the class imbalance problem by assigning different weights to the defective and non-defective software modules.We conduct experiments on ten projects in the NASA dataset and five projects in the AEEEM dataset,and use six indicators to evaluate the performance of the proposed framework.The results show that the performance of our proposed within version defect prediction framework is gererally better than its variant methods,some feature selection methods,and class imbalanced learning methods.(2)In order to select a subset of software modules from the previous version as the training set that is optimal for the data of the current version,this paper proposes a twostage training subset selection framework for cross version defect prediction.This framework first uses the sparse modeling representation selection method to filter out some useless software modules and keeps the software modules that can minimize the error of reconstructing original data.Since this process does not rely on the assistance of the software modules from the current version,it is a self-simplification stage.Then,with the participation of the data from the current version,the framework uses the dissimilarity-based sparse subset selection method to further select a subset from the selected modules in the previous stage to effectively represent the data of the current version.The model constructed with the final selected module subset is more targeted to the data of the current version.Since this process requires the assistance of the software modules from the current version,it is an auxiliary refining stage.We conduct experiments on 67 versions from 17 projects in the PROMISE dataset and also use six indicators to evaluate the performance of the proposed framework.The results show that,across a total of 50 cross-version pairs,the overall performance of our proposed cross version defect prediction framework is superior to other training subset selection method and the variant method based on one-stage training subset selection.(3)In order to further narrow the distribution difference between the two crossproject data,this paper proposes a new transfer learning based cross project defect prediction framework by introducing a state-of-the-art balanced distribution adaptive model.Unlike the previous transfer cross project defect prediction models which only considered the marginal distribution differences across data,this model comprehensively considers the marginal and conditional distribution differences across data.In addition,considering the impacts of the similarity between cross project data on the relative importance degrees of the two distribution differences,the model also assigns the weights to the two differences for adapting different cross-project pairs.Moreover,we also investigate the impacts of six different data normalization strategies on the performance of this cross project defect prediction framework.We conduct experiments on five projects in the NASA dataset and five projects in the AEEEM dataset,and also use six indicators to evaluate the performance of the proposed framework.The results show that,across a total 40 cross-projects pairs,the overall performance of our proposed cross project defct prediction framework performs better than other transfer learning based and training data filter based cross project models.In conclusion,this paper aims at solving difficult problems in different software defect prediction scenarios and proposing new framework models to improve the performance of defect prediction by combining new machine learning technologies.This paper expands the application of machine learning technologies in the field of software engineering and provides new solutions to software defect prediction,which is of great significance for software quality assurance activities.
Keywords/Search Tags:software defect prediction, feature learning, class imbalanced learning, training subset selection, transfer learning
PDF Full Text Request
Related items