Font Size: a A A

Ensemble Learning For Software Defect Detection

Posted on:2016-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:W C HuangFull Text:PDF
GTID:2308330473465515Subject:Information security
Abstract/Summary:PDF Full Text Request
In recent years, software defect prediction is one of the most important research topics in software engineering which can be generally categorized into two types: dynamic and static defect prediction. Static detection is to predict the software defect using software historical data, which has a high applicability and accuracy and has been extensively researched and used. The key for static defect prediction is how to make the analysis of software historical data in order to establish an accurate classification model to distinguish between defective software and defect-free software.For the software defect detection, the samples of defective software will be much less than defect-free software samples, which lead to the serious class-imbalance problem. How to solve this problem becomes a critical issue for software defect detection. Generally, re-sampling methods or cost-sensitive methods would be used to deal with it. Here, we use re-sampling based Bagging method. Every time training the weak classifiers, we re-sample a subset of samples which is classbalance. By fusion of weak classifiers, we could get strong classifier to improve the generalization ability and the classification accuracy of the model.In ensemble learning methods, if we want to further improve the classification performance, we can improve the single classifier, or improve the randomness of weak classifier, we can also find a suitable fusion method. We used down-sampling method to improve the single classifiers above. On this basis, we increase the randomness, optimize the fusion methods to further improve the model results.In ensemble learning, the more independence the weak classifiers are, the better the final result would achieve. In contrast to the random-sample based methods, the random-feature based method would get more independent weak classifiers and obtain a better stability and accuracy strong classifiers. In this paper, we propose a novel approach that employs feature structural based random subspace method for software defect prediction, which further improves the final result.By the above method, we get a series of weak classifiers. Due to the special nature of dichotomous, the general accuracy could not accurately describe the effect of dichotomous model. Here, we propose a weighted fusion method based on the comprehensive evaluation index F-measure to further optimization of the model results.For all methods we proposed, we have experiments on NASA software defect database and compared our method with some of the most popular methods this years. Experimental shows that our method gets the best results on the ten NASA software defect databases.
Keywords/Search Tags:Software defect detection, Improved Bagging, Feature Construction, Random Feature Subspace, Classifier Fusion
PDF Full Text Request
Related items