Font Size: a A A

Research On Software Defect Prediction Based On Learning Mechanism

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330542494358Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of our society,the diversity and universality of software has penetrated into people's life.The reliability of software is an important guarantee for people to carry out daily transactions.There are many factors that can bring about threat to the reliability of software systems,and the existence of software defects is one of the main factors.The timely discovery of the potential defects in the software helps to improve the quality of the software and save the cost.Based on the existing software defect data that has been collected,the software defect prediction technology can predict the defects of the new modules.However,in software defect prediction,the labeled samples are difficult to obtain,Moreover,the number of defect modules in software defect data set is far less than the number of non defective modules,that is,software defect datasets are class imbalance,so it will cause some negative impact on the prediction results of software modules.In order to solve the above problems,this paper makes use of the machine learning method to establish a new software defect prediction model.The main work of this thesis are as follows:(1)In order to solve the problem that the shortage of labeled samples and class imbalance,a semi-supervised ensemble learning software defect prediction model(Tri_Adaboost)is proposed in this paper.On the one hand,the labeled sample set are extended by using under-sampling method and semi-supervised learning method Tri-training,and for alleviating the problem that shortage of labeled samples,randomly selecting a part of unlabeled samples to pre-label;on the other hand,because the data sets that have been extended are also class imbalance,so,in order to improve the performance of the prediction model.Before classifying and predicting the new data sets,making use of the SMOTE algorithm to sample new data sets first,and then use the Adaboost ensemble learning method to classify and predict data sets.The experimental results indicate that the method that is proposed in this thesis can obviously improve the prediction performance of the model.(2)The null pointer reference defect data set is generated based on the open source project.Because in general,the data sets that are used to verify software defect prediction model are based on the module,That is,the data sets only point out that a specific module exist a defect,but it does not point out the type of this defect in the module.For verifying the validity of the Tri_Adaboost prediction model,we extract the null pointer reference defects that are existing in open source projects,and generate null pointer reference defect datasets according to the metric information in software.(3)The NASA MDP data set and the null pointer referenced defect data set generated based on the open source projects are used to verify the prediction model proposed in this paper.After comparing and analysing,Tri_Adaboost algorithm can achieve higher values on F-measure and AUC.
Keywords/Search Tags:software defect prediction, class imbalance, semi-supervised learning, AdaBoost
PDF Full Text Request
Related items