Font Size: a A A

Research On Semi-supervised Learning Based Software Defect Prediction

Posted on:2017-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:S P LiaoFull Text:PDF
GTID:2348330509954399Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the increasing growth of software scale and complexity, software reliability become the focus of attention. And software defect is an important factor threatening the reliability of software. It costs a lot of manpower and resources to detect and fix the defects. Besides the longer the software defects exist, the more the fixing will cost. How to predict defects before software release has become a serious problem. Most of the traditional static defect prediction techniques mainly focus on supervised methods. This type of method needs a lot of history data for training. Unfortunately, it is costly to obtain a large amount of labeled data in practical applications, especially for those do not have history versions. On the other hand, the data in software defect prediction are essentially imbalanced, and the misclassification cost of defect-prone modules is generally several times higher than that of non-defect-prone modules. It is more important to obtain lower misclassification cost than to attain lower classification error rate.In this paper, a software defect prediction model based on semi-supervised support vector machine was proposed, in which large amount of unlabeled data are exploited in addition to few labeled data for building prediction model. And the validation of this method was presented in this paper. By extending semi-supervised support vector machine and combining cost-sensitive learning, a software prediction algorithm based on cost-sensitive semi-supervised support vector machine was proposed. This method aims to address the issues including total misclassification cost, class imbalance and few labeled data. The main works are as follows:1. Analyzes and elaborates relevant theoretical knowledge of software defect prediction and the related work, discuss the problems in the field of software defect prediction, and investigates the technology related to data extraction and preprocessing in software defect prediction.2. In order to tackle the issue that the predictive models may not perform well when learning from a small labeled training set, we propose an unsupervised sample method.3. A software defect prediction model which combines semi-supervised support vector machine with unsupervised sample method was built to tackle the limited defect data problem. By extending this model and combining cost-sensitive learning, a novel software defect prediction model based on cost-sensitive semi-supervised support vector machine was proposed. This method aims to solve the two issues including the limited data issue and the lack of sensitivity to the software defects issue.4. In order to verify the effectiveness of the method proposed in this paper, we conduct the experiments and analyze the results on four NASA datasets in terms of accuracy, F-measure and Normalized Expected Cost of Misclassification.The experimental results show that the proposed approach which combines the cost-sensitive semi-supervised support vector machine and the sample method achieves comparable performance compared with supervised learning models, but uses little defect information, and it is more practical for real-world applications. Moreover, proposed method's performance is better than other semi-supervised learning methods in terms of recall and F-measure. This model provided a new idea and solution for software defect prediction problem.
Keywords/Search Tags:software defect prediction, cost-sensitive, semi-supervised, SVM, Sample
PDF Full Text Request
Related items