Font Size: a A A

Static Metrics Based Cross-Project Software Defect Prediction

Posted on:2020-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:L HuangFull Text:PDF
GTID:2428330590496065Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Software defect prediction(SDP)is an active research topic in software engineering.It aims to seek effective ways to predict the tendency of defects with the given software project.A predictive model is built by mining software archives,extracting corresponding metrics,and predicting new file systems.Usually,a new file system without enough historical data is hard to train the model,so the Cross-Project Defect Prediction(CPDP)comes into being.It uses a similar project(source project)from the same company as the training data to build a predictive model,then the current project(target project)as a test set is predicted whether it is defective.However,the distributions between CPDP source project and target project are different.Most of the defective data have a complicated structure and a marked characteristic of class-imblance.Based on the above problems,this paper starts to solve the difficulties in cross-project software defect prediction technology,which improves the forcasting ability of the predictive model.Firstly,a software defect prediction method based on neighbor preserving embedding correlation component analysis(NPE-CCA)is proposed.First of all,the method transforms the target project sample information to the weight of the source project sample according to the data gravitation technique.A weight vector about the source project sample is obtained.Then,in order to maximize the correlation between the data,it uses the canonical correlation analysis to find the common space between the original source project sample and the target project sample.Finally,the NPE algorithm is used to reduce the data dimension and preserve the neighbor geometry of the data.Then it combines the previous weight matrix to recieve the final training samples.In the process of classification prediction,the classification model of support vector machine is used to predict one-to-one cross-project software defect.Secondly,a software defect prediction based on neighbor preservation embedding kernel canonical correlation analysis(NPE-KCCA)is proposed.This method is mainly based on the improvement of NPE-CCA.The kernel method is introduced on the CCA method to solve the separability problem of nonlinear data in software defect prediction.By this means,the nonlinear data can be divided.The prediction performance of the prediction model is greatly improved.Finally,a software defect prediction method based on cost-sensitive transfer multi-kernel ensemble learning(CTMKEL)is proposed.First of all,the data gravitation method is used to make the distributions of the source project sample and the target project sample similar.Then the multi-kernel learning method is introduced.For each kernel function,the processed features are mapped to the high-dimensional space,and one SVM classifier is combined to obtain multiple kernel-based classifier.In order to avoid the parameter complexity of multi-kernel learning,each kernel-based weak classifier is trained by boosting.With the weight updating process of boosting,a cost-sensitive matrix is introduced for considering the two costs of misclassification.The experimental data set of this paper mainly uses NASA,ReLink and AEEEM.According to the experimental results of these comparison algorithms,it can be found that the proposed algorithms improve the performance of classification prediction.
Keywords/Search Tags:software defect prediction, transfer learning, canonical correlation analysis, cost-sensitive learning, neighbor perserve embedding
PDF Full Text Request
Related items