Font Size: a A A

Research On Cross-Project Software Defect Prediction Method Based On Machine Learning

Posted on:2024-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:W T LinFull Text:PDF
GTID:2568306914972449Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Software defect prediction is an effective means to guarantee software quality.However,due to the lack of training data,cross-project software defect prediction is needed to make up for this deficiency in practical application.In cross-project defect prediction,this thesis use non-project data to train the model,and use software defect prediction to alleviate the problem of insufficient data.Current cross-project defect prediction research has two key problems to be solved.One is the difference between the characteristics of the source project and the target project.Another problem is class imbalance.This thesis studies these two key problems.The research content is as follows.(1)This thesis proposes a cross-project defect prediction model based on feature selection and ensemble learning.The feature selection method based on Pearson correlation coefficient is used in the domain adaptation phase.This method looks for similar features between source items and target items so as to reduce the difference in feature distribution between source items and target items.In the classification stage,this model uses the majority voting method and integrates several representative base classifiers.The effect of class unbalance can be reduced by using the characteristic of mutual correction among the classifiers of majority voting method.(2)This thesis proposes a cross-item defect prediction model based on two-stage feature amplification.In this model,the idea of semisupervision is introduced in the domain adaptation stage,and the feature search technology based on greedy optimal search is used to carry out feature migration,amplification and class features with strong correlation.In this way,the relationship between classes can be considered on the basis of learning the relationship between characteristics and classes.This model makes the feature distribution between source and target items more similar by constructing a target item-specific feature space.In the classification stage,random forest,an integrated learning method sensitive to the feature-class relationship,is selected as the classifier to further amplify the role of features in classification and reduce the influence of class imbalance.According to the above research content,this thesis designed 7 experiments on the public data set AEEEM for the two proposed methods,a total of 140 groups of experiments to test the proposed model.The ablation experiment and comparison experiment were included.The experimental results show that the method proposed in this thesis can ameliorate the influence caused by the difference of feature distribution and class unbalance between source items and target items.The method proposed in this thesis can improve the overall performance of crossproject defect prediction model.
Keywords/Search Tags:cross-project defect prediction, machine learning, domain adaptation, feature distribution difference, class imbalance
PDF Full Text Request
Related items