| Cross-Project Defect Prediction(CPDP)is a good solution when historical data of a software project is missing or insufficient.Although the existing CPDP methods have achieved good performance,there are still some challenging problems in cross-project defect prediction.On the one hand,traditional CPDP methods tend to destroy the discriminative features when reducing the differences in data distribution among different projects,and there is room for improvement in fully utilizing the label information of labeled instances and the pseudo-label information of unlabeled instances.On the other hand,some CPDP methods have some room for performance improvement in dealing with the class imbalance problem.This paper studies the above problems,and the main research content is as follows:First,in order to reduce the data distribution differences among different projects and to make full use of labeled and unlabeled instances of pseudo-label information,a cross-project defect prediction method based on Domain Adaptation and Pseudo-label Learning(DAPL)is proposed.The method first trains the classifier with labeled instances,and uses the classifier to output pseudo-label for unlabeled instances,then reduces the data distribution differences between projects by using the pseudo-label information output from the classifier and auxiliary classifier,and finally ensures reliable pseudo-label for unlabeled instances by pseudo-label learning.At the same time,the DAPL method also makes full use of the label information of labeled instances,so that the relevance of instances with the same label increases,while the relevance of instances with different labels decreases.Second,in order to fully exploit the potential relationship between the source and target projects to better reduce the data distribution differences between different projects,a cross-project defect prediction method based on Generative Adversarial Networks with Domain Mixup(GANDM)is proposed.The GANDM method consists of four components,namely,a feature encoder,a feature decoder(generator),a discriminator,and a classifier.The method enhances the discriminative power of the discriminator by constructing an intermediate domain,which indirectly enhances the data generation power of the generator,and effectively reduces the differences in data distribution among projects by playing a continuous game.At the same time,the feature encoder and the generator form a self-encoder so that the model can learn more effective feature representation.In addition,in order to reduce the impact of class imbalance on the model training,cost-sensitive learning is introduced to reduce the overfitting of the model to the majority classes,thus helping the model to improve its prediction performance.Finally,a Meta-weight Network(MWN)based cross-project defect prediction method is proposed to further reduce the impact of class imbalance problem.The method automatically assigns a weight to the cross-entropy loss generated by each instance by introducing a meta-weight network,avoiding the need to manually pre-specify the weighting function and its additional hyper-parameters.The MWN method extracts a certain percentage of balanced data from the initial source project training set as the meta-data set,and then the meta-weight network uses the meta-data set for training.In addition,in order to enhance the feature reconstruction capability of the self-encoder,the MWN method provides different individual networks for the source and target projects respectively,and the individual networks are used to learn the individual feature representations of different projects,which further enhances the feature reconstruction capability of the self-encoder and helps the model to learn more effective feature representations and thus improve the prediction performance of the model.In this paper,the proposed three methods,DAPL,GANDM and MWN,are extensively experimented on Promise dataset and compared with some existing CPDP methods.The experimental results show that the proposed methods have better performance. |