Research On Class-imbalanced Cross-project Defect Prediction Based On Adversarial Learning

Posted on:2023-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:X H Zheng

Full Text:PDF

GTID:2568306836476864

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Cross-Project Defect Prediction(CPDP)is a good solution when historical data of a software project is missing or insufficient.Although the existing CPDP methods have achieved good performance,there are still some challenging problems in cross-project defect prediction.On the one hand,traditional CPDP methods tend to destroy the discriminative features when reducing the differences in data distribution among different projects,and there is room for improvement in fully utilizing the label information of labeled instances and the pseudo-label information of unlabeled instances.On the other hand,some CPDP methods have some room for performance improvement in dealing with the class imbalance problem.This paper studies the above problems,and the main research content is as follows:First,in order to reduce the data distribution differences among different projects and to make full use of labeled and unlabeled instances of pseudo-label information,a cross-project defect prediction method based on Domain Adaptation and Pseudo-label Learning(DAPL)is proposed.The method first trains the classifier with labeled instances,and uses the classifier to output pseudo-label for unlabeled instances,then reduces the data distribution differences between projects by using the pseudo-label information output from the classifier and auxiliary classifier,and finally ensures reliable pseudo-label for unlabeled instances by pseudo-label learning.At the same time,the DAPL method also makes full use of the label information of labeled instances,so that the relevance of instances with the same label increases,while the relevance of instances with different labels decreases.Second,in order to fully exploit the potential relationship between the source and target projects to better reduce the data distribution differences between different projects,a cross-project defect prediction method based on Generative Adversarial Networks with Domain Mixup(GANDM)is proposed.The GANDM method consists of four components,namely,a feature encoder,a feature decoder(generator),a discriminator,and a classifier.The method enhances the discriminative power of the discriminator by constructing an intermediate domain,which indirectly enhances the data generation power of the generator,and effectively reduces the differences in data distribution among projects by playing a continuous game.At the same time,the feature encoder and the generator form a self-encoder so that the model can learn more effective feature representation.In addition,in order to reduce the impact of class imbalance on the model training,cost-sensitive learning is introduced to reduce the overfitting of the model to the majority classes,thus helping the model to improve its prediction performance.Finally,a Meta-weight Network(MWN)based cross-project defect prediction method is proposed to further reduce the impact of class imbalance problem.The method automatically assigns a weight to the cross-entropy loss generated by each instance by introducing a meta-weight network,avoiding the need to manually pre-specify the weighting function and its additional hyper-parameters.The MWN method extracts a certain percentage of balanced data from the initial source project training set as the meta-data set,and then the meta-weight network uses the meta-data set for training.In addition,in order to enhance the feature reconstruction capability of the self-encoder,the MWN method provides different individual networks for the source and target projects respectively,and the individual networks are used to learn the individual feature representations of different projects,which further enhances the feature reconstruction capability of the self-encoder and helps the model to learn more effective feature representations and thus improve the prediction performance of the model.In this paper,the proposed three methods,DAPL,GANDM and MWN,are extensively experimented on Promise dataset and compared with some existing CPDP methods.The experimental results show that the proposed methods have better performance.

Keywords/Search Tags:

cross-project defect prediction, generating adversarial networks, class-imbalance, pseudo-labeling, domain mixup

PDF Full Text Request

Related items

1	Research On Data Pre-processing Methods For Cross-project Software Defect Prediction
2	Research On Cross-Project Software Defect Prediction Method Based On Machine Learning
3	Research On Key Technologies Of Cross-Project Heterogeneous Defect Prediction
4	Software Defect Prediction Strategy Design For Imbalanced Data
5	Study On Cross-Project Defect Prediction Based On Transfer Learning
6	Research On Cross-project Software Defect Prediction Method Based On Active Learning
7	Research On Software Defect Prediction Based On Improved Balanced Distribution Adaption Algorithm
8	Research On Data Preprocessing Technology In Cross Project Software Defect Prediction
9	Design And Tool Implementation Of Cross-project Software Defect Prediction Method Via Active Transfer Learning
10	Correlation Analysis Based Cross-project Software Defect Prediction