Font Size: a A A

Software Defect Prediction Research For Unlabeled Datasets

Posted on:2017-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z F LuFull Text:PDF
GTID:2348330509954400Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software defects refers to the software product doesn't meet the requirement functions. The reliability and quality of software will reduce by defect. Software defect prediction can improve the efficiency of software development and testing to ensure software quality. Researchers proposed unsupervised defect prediction method and cross-project defect prediction method for defect prediction of unlabeled datasets based on software metric values, and achieved good defect prediction result. These two methods can be quickly applied to engineering practice as they don't need labeled data. However, CLA(Clustering and LAbeling) unsupervised method ignores the influence of the difference of the metric value and its threshold. Cross-project defect prediction method doesn't use label data in the calculation of the project distribution similarity.This paper proposes PCLA(Probabilistic Clustering and LAbeling) unsupervised defect prediction method and Cross-project defect prediction method based on PCLA. The main contents are as follows:1. Proposed PCLA unsupervised defect prediction method, evaluates the probability of the class' s defect by mapping the difference of the metric value and its threshold to probability, and then predicts class by clustering and labeling.2. Analyzed the prediction performance of PCLA by the calculation of metric threshold, the parameter a of the Sigmoid function and the parameter O of the calculating of the label threshold.3. Compared the performance of PCLA and CLA. The result show that PCLA method averagely improves the Recall rate, Precision rate, F-measure by 4.58%, 2.56%, 3.25% respectively on 7 datasets of Net Gen and Relink.4. Compared different models on cross-project defect prediction method. Apply 10 prediction models on 15 project, total 50 versions datasets and the result show that the SimpleLogistic model achieved the best performance.5. Proposed Cross-project defect prediction method based on PCLA, which uses PCLA to label the target dataset, then calculates the distribution similarity between Optional source projects and target project Finally, training defect prediction model via source project to predict target project.6. Compared 9 strategies that use to calculate distribution similarity. The result show that using the percentage of label data for calculating distribution similarity achieved best performance. Compared the Cross-project defect prediction based on PCLA with PCLA method, the result show that the Cross-project defect prediction based on PCLA improves the success rate by 200%.The methods proposed for defect prediction for unlabeled dataset in this paper can be quickly applied to engineering practice, especially for new projects. These methods is helpful to improve software quality.
Keywords/Search Tags:Software engineering, Defect prediction, Unlabeled, Unsupervised, Cross-project
PDF Full Text Request
Related items