Research Onsoftware Fault Prediction Method Based On Transfer Learning And PU Learning

Posted on:2018-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:R T Ma

Full Text:PDF

GTID:2348330515950420

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of artificial intelligence,machine learning has been used in the software fault prediction,software fault prediction based on the traditional machine learning needs a large number of labeled samples to build.In reality,software fault data is often obtained through manual testing,it is time-consuming and costly.In order to reduce the traditional software fault prediction method in the supervision of learning the scene under the sample needs of the sample,this paper studies the PU learning and transfer learning,proposed in view of the PU scenarios,use cross-company or cross-project with failure data for knowledge transfer,forecast target fault samples,the specific work is as follows:(1)The instance transfer based on random forest algorithmin PU scenarios(POSTRF algorithm)In PU scenarios,based on the idea of Bayesian interclass transfer.View the predicting sample as target domain dataset,view the cross-company or cross-project software fault sample as the source domain dataset.Sample the source domain dataset by returned and training to get multiple PU random decision tree.Calculate the sample weights based on AUC and sample set.Create a PU dataset by transfer samples that have similar distribution to the target domain data and target domain data.The POSC4.5 model is constructed to predict the software fault samples in the target domain.The algorithm firstly samples the source domain dataset with the bagSize ratio to obtain the M sample set and train M PU random decision trees,random sample 75% of target domain dataset as test set,set AUC as the weight of tree,use the weight of tree weight sample set,and statistics the sample set as final weighted sample,use transfer ratio r select high weight samples and transfer into target set.Build PU dataset use target domain dataset and transfer samples based on completely random hypothesis,calculate uncertain information gain by size of positive sample and unlabeled sample as well as positive prior probability,select the attribute and build tree model recursively,predict the target domain fault samples.(2)Experiments on POSTRF algorithmThe eight software fault data sets of the NASA database were collected as an experimental data sets.View 0kc3,cm1 data set as the target domain dataset,the rest of the dataset as the source domain dataset.Comparing the algorithm with POSC4.5 algorithm,the POSTRF algorithm improves the classification performance of the model by changing the sample of other auxiliary sets on the 0kc3 and cm1 target sets,and the AUC value is improved by about 3%-12% The fault detection rate PD is increased by about 5%.Therefore,the POSTRF algorithm proposed in this paper has a considerable or better predictive performance for fault samples in the target domain compared with the traditional PU learning algorithm by cross-project or cross-company software failure data.

Keywords/Search Tags:

software fault prediction, PU learning, transfer learning, random forest

PDF Full Text Request

Related items

1	Research On Key Technologies Of Personal Behavior Prediction Based On Random Forest
2	Research Of Random Forest Transfer Learning Based On Instance
3	Research And Optimization Of Software Fault Prediction Model Based On Machine Learning Method
4	Research On Hard Disk Failure Prediction Method Based On Improved Random Forest Algorithm
5	Application Of Learning-to-rank Method Based On Random Forest In Self-made Dataset
6	Research On Heterogeneous Software Defect Prediction Based On Transfer Learning
7	Research On Software Defect Prediction Based On Machine Learning Algorithm
8	Software Fault Prediction Based On Machine Learning Approaches
9	Researches On Software Defect Prediction Methods Under Different Scenarios
10	Research On Software Vulnerability Prediction Method Based On Deep Transfer Learning