Research Of Software Fault Detection Method Based On PU Learning

Posted on:2016-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhang

Full Text:PDF

GTID:2308330461966596

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Various kinds of softwares are playing very important roles in modern society, and our work and life more and more rely on the service of software system, such as office, media players, games, blog and so on. The more people’s life depends on softwares, the more requirements of software reliability we need. Software developers must use some software fault detection method to avoid software faults that could lead to a great loss. Besides, it’s hard to collect software fault data because it’s time-consuming and error-prone for experts to label unknown data. Accordingly, this thesis proposed a classifier ensemble method for software fault detection only using positive and unlabeled data, thus PU learning. This research mainly included the following two ensemble method:(1) Static classifier ensemble method SB_POSC4.5. This method first used the SMOTE algorithm to balance the data distribution, and then the repeatable sampling technique was used for several times to construct different datasets. Thirdly, using the POSC4.5 algorithm as the basic classifier for Bagging to build a classifier ensemble model. The final output of this ensemble is determined using the majority voting method. This algorithm is a static ensemble fusion method.(2) Dynamic classifier ensemble method DCS_LPD. After building the classifier ensemble we need to assign weights to the different basic classifiers based on the features of every single test samples. Here we used LPD(Local Probability of Detection) measure to assign weights. The base classifier which has the highest LPD will be assigned the highest weight, and then it will be chosen as the final output of this classifier ensemble. This algorithm was extended from DCS_LCA and is applicable to software fault detection.The comparison and analysis of these two algorithms were showed in experiments.Experiments were conducted on 12 datasets from NASA MDP datasets. The results are listed in tables or showed in figures. The results show that the probability of detection of SB_POSC4.5 is lower than C4.5by0.067 because PU method wasnot using negative samples. The probability of detection of the static ensemble method SB_POSC4.5 was reach up to 89.1%. The F1 measure of DCS_LPD is about 86.3%, which is higher than SB_POSC4.5 for about 20.6%. It means that dynamic classifier ensemble is better than the static ensemble generally.

Keywords/Search Tags:

software fault prediction, PU learning, imbalanced data, decision tree, ensemble classifier

PDF Full Text Request

Related items

1	Imbalanced Data Classification And Its Application In The Prediction Of The Mobile Phone Replacement
2	Research On Imbalanced Data Processing In Software Defect Prediction
3	Research On Decision Tree Classification Method Of Imbalanced Data Based On Reinforcement Learning
4	An Adaptive Sampling Ensemble Classifier For Learning From Imbalanced Data Sets
5	The Research On Classifier Ensemble Learning For Data Mining
6	Research And Optimization Of Software Fault Prediction Model Based On Machine Learning Method
7	A New Random Projection-Based Ensemble Classifier For High Dimensional Imbalance Data
8	Application Of Ensemble Decision Tree De Based On Improved Data Protocol In Medical Decision-Making
9	Research Of Imbalanced Data Ensemble Classification Algorithm Based On Oversampling
10	Research On Ensemble Learning