Font Size: a A A

Research Of Software Fault Detection Method Based On PU Learning

Posted on:2016-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2308330461966596Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Various kinds of softwares are playing very important roles in modern society, and our work and life more and more rely on the service of software system, such as office, media players, games, blog and so on. The more people’s life depends on softwares, the more requirements of software reliability we need. Software developers must use some software fault detection method to avoid software faults that could lead to a great loss. Besides, it’s hard to collect software fault data because it’s time-consuming and error-prone for experts to label unknown data. Accordingly, this thesis proposed a classifier ensemble method for software fault detection only using positive and unlabeled data, thus PU learning. This research mainly included the following two ensemble method:(1) Static classifier ensemble method SB_POSC4.5. This method first used the SMOTE algorithm to balance the data distribution, and then the repeatable sampling technique was used for several times to construct different datasets. Thirdly, using the POSC4.5 algorithm as the basic classifier for Bagging to build a classifier ensemble model. The final output of this ensemble is determined using the majority voting method. This algorithm is a static ensemble fusion method.(2) Dynamic classifier ensemble method DCS_LPD. After building the classifier ensemble we need to assign weights to the different basic classifiers based on the features of every single test samples. Here we used LPD(Local Probability of Detection) measure to assign weights. The base classifier which has the highest LPD will be assigned the highest weight, and then it will be chosen as the final output of this classifier ensemble. This algorithm was extended from DCS_LCA and is applicable to software fault detection.The comparison and analysis of these two algorithms were showed in experiments.Experiments were conducted on 12 datasets from NASA MDP datasets. The results are listed in tables or showed in figures. The results show that the probability of detection of SB_POSC4.5 is lower than C4.5by0.067 because PU method wasnot using negative samples. The probability of detection of the static ensemble method SB_POSC4.5 was reach up to 89.1%. The F1 measure of DCS_LPD is about 86.3%, which is higher than SB_POSC4.5 for about 20.6%. It means that dynamic classifier ensemble is better than the static ensemble generally.
Keywords/Search Tags:software fault prediction, PU learning, imbalanced data, decision tree, ensemble classifier
PDF Full Text Request
Related items