Font Size: a A A

Cross-Project Software Defect Prediction Methods Based On Autoencoder

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2428330614465817Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Software defect prediction(SDP)has been a hot research topic in software engineering.Its main goal is to discover defects existing in the software for improving the software quality.The previous researches mainly focused on within-project defect prediction(WPDP),mainly use the historical data of one project to train a prediction model and test the defect proneness of software instances from the same project.However,when there is not enough historical data available in the same project,the performance of WPDP becomes significantly poor.Cross-project defect prediction(CPDP)as a new solution,CPDP builds a prediction model by using plenty of historical data from other project and predicting defects in a new project instances.However,its prediction performance is usually poor,because of the data distribution difference between source and target projects,and the class imbalance problem.Based on these two problems,deep autoencoder technology is applied in CPDP,and three different methods are proposed to improve the performance of defect prediction.Firstly,to solve the data distribution difference problem,a shared hidden layer autoencoder for cross-project defect prediction(SHLA-SDP)method is proposed.SHLA-SDP first designs a network structure of shared hidden layer autoencoder,which can effectively reduce the feature distribution difference between source and target projects by using hidden layer parameter sharing mechanism.Then an intra-class compactness loss function is designed to effectively constrain the features in the common subspace of the hidden layer,thus improving the compactness of the intra-class features.Finally,the deep features of source project are used to construct the defect prediction model,the accuracy of the defect prediction model is improved.Secondly,in order to solve the problems of class imbalance and less labeled data,a semisupervised cost-sensitive improved autoencoder for cross-project defect prediction(CSSHLA-SDP)method is proposed.CSSHLA-SDP combines supervised learning and unsupervised learning in the training of deep autoencoder.It adds the intra-class compactness loss to the supervised part and the reconstruction loss to the unsupervised part during the training process.Besides,cost-sensitive learning technology is introduced to effectively alleviate the class imbalance problem.Its approach is different kinds of samples are assigned different misclassification cost values.The performance of defect prediction is further improved.Finally,in order to obtain intra-class features with better compactness and inter-class features with better separation,an improved focal loss based autoencoder for cross-project defect prediction(FLSHLA-SDP)method is proposed.In the training of deep autoencoder,FLSHLA-SDP utilizes intra-class compactness loss and inter-class separation loss to make the distribution of source projects and target projects more similar in common subspace.In addition,a better focal loss function is used to deal with the class imbalance problem by combining class weighting and difficulty weighting.First,different weights are applied to different classes of samples,and then considering the difficulty classification degree of the sample,different weights are applied to the difficult classification sample and the easy classification sample.Compared with the 5 classical comparision algorithms for CPDP,the experiments of the 3 methods proposed in this paper improved the performance of defect prediction on RELINK,NASA and AEEEM datasets.
Keywords/Search Tags:Deep autoencoder, shared hidden layer mechanism, cost-sensitive learning, focal loss, cross-project defect prediction
PDF Full Text Request
Related items