Font Size: a A A

Research Onsoftware Fault Prediction Method Based On Transfer Learning And PU Learning

Posted on:2018-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:R T MaFull Text:PDF
GTID:2348330515950420Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence,machine learning has been used in the software fault prediction,software fault prediction based on the traditional machine learning needs a large number of labeled samples to build.In reality,software fault data is often obtained through manual testing,it is time-consuming and costly.In order to reduce the traditional software fault prediction method in the supervision of learning the scene under the sample needs of the sample,this paper studies the PU learning and transfer learning,proposed in view of the PU scenarios,use cross-company or cross-project with failure data for knowledge transfer,forecast target fault samples,the specific work is as follows:(1)The instance transfer based on random forest algorithmin PU scenarios(POSTRF algorithm)In PU scenarios,based on the idea of Bayesian interclass transfer.View the predicting sample as target domain dataset,view the cross-company or cross-project software fault sample as the source domain dataset.Sample the source domain dataset by returned and training to get multiple PU random decision tree.Calculate the sample weights based on AUC and sample set.Create a PU dataset by transfer samples that have similar distribution to the target domain data and target domain data.The POSC4.5 model is constructed to predict the software fault samples in the target domain.The algorithm firstly samples the source domain dataset with the bagSize ratio to obtain the M sample set and train M PU random decision trees,random sample 75% of target domain dataset as test set,set AUC as the weight of tree,use the weight of tree weight sample set,and statistics the sample set as final weighted sample,use transfer ratio r select high weight samples and transfer into target set.Build PU dataset use target domain dataset and transfer samples based on completely random hypothesis,calculate uncertain information gain by size of positive sample and unlabeled sample as well as positive prior probability,select the attribute and build tree model recursively,predict the target domain fault samples.(2)Experiments on POSTRF algorithmThe eight software fault data sets of the NASA database were collected as an experimental data sets.View 0kc3,cm1 data set as the target domain dataset,the rest of the dataset as the source domain dataset.Comparing the algorithm with POSC4.5 algorithm,the POSTRF algorithm improves the classification performance of the model by changing the sample of other auxiliary sets on the 0kc3 and cm1 target sets,and the AUC value is improved by about 3%-12% The fault detection rate PD is increased by about 5%.Therefore,the POSTRF algorithm proposed in this paper has a considerable or better predictive performance for fault samples in the target domain compared with the traditional PU learning algorithm by cross-project or cross-company software failure data.
Keywords/Search Tags:software fault prediction, PU learning, transfer learning, random forest
PDF Full Text Request
Related items