Font Size: a A A

Research On Cross-Project Software Defect Prediction

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:D L XingFull Text:PDF
GTID:2518306557968639Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cross-project defect prediction(CPDP)is a feasible solution to build an accurate prediction model without enough historical data.Therefore,CPDP has been a research hotspot in recent years.Although existing methods for CPDP achieve great prediction results,there still exists some room to further improve the prediction performance.On the one hand,most existing methods for CPDP can not make full use of available label information and reduce the distribution difference of data at the same time.On the other hand,in recent years,related research works that jointly use the semantic features extracted from software source codes and traditional software metrics to build a prediction model can not effectively extract features from software source codes.In view of the shortcomings of these methods for CPDP,this paper conducts further research.The main research contents are as follows:Firstly,to make full use of available label information and reduce the distribution difference of data simultaneously,a method named semi-supervised discriminative distribution matching(SDDM)is proposed.This method uses the classifier trained with labeled data to infer the labels for the unlabeled data(test data)and updates the labels of unlabeled data iteratively in the training process.This method reduces the differences of both marginal distributions and conditional distributions.Moreover,this method makes full use of the label information to let the distance between the feature vectors of modules from the same class as close as possible and the distance between the feature vectors of modules from different classes as far as possible,which learns discriminative features of modules with different classes.Secondly,to effectively extract features from software source codes,a method named fusing network with software metrics and semantic features(FNSS)is proposed.This method first uses the convolutional neural network to extract semantic features from source codes of source and target projects.Next,FNSS obtains feature representations of semantic features and software metrics from the encoder of the auto-encoder,whose distributions are as similar as possible.These feature representations can retain the information of original features well.And then FNSS fuses these feature representations.FNSS finally predicts labels by using neural network classification.Finally,to further effectively extract features from software source codes,a method named fusing network based on adversarial learning(FNAL)is proposed.This method is based on the FNSS method and uses adversarial learning.FNAL consists of three major parts: feature extractor,feature discriminator and feature transformer.The feature extractor attempts to obtain the feature representations,which can retain the information of original features well.The feature discriminator attempts to distinguish feature representations of semantic features and software metrics.The feature transformer attempts to learn discriminative features of modules with different classes.By playing a min-max game between the feature discriminator and feature transformer,FNAL can reduce the distribution difference between semantic features and software metrics to effectively maximize their correlation.Extensive experiments are conducted on 10 projects from the PROMISE dataset.Comparing with the existing representative methods,the prediction performance of SDDM,FNSS and FNAL improve at least 2%,4%,5%.The experimental results indicate the effectiveness of these proposed approaches.
Keywords/Search Tags:cross-project defect prediction, semi-supervised learning, semantic feature learning, adversarial learning
PDF Full Text Request
Related items