Software Defect Prediction Research For Unlabeled Datasets

Posted on:2017-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Lu

Full Text:PDF

GTID:2348330509954400

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Software defects refers to the software product doesn't meet the requirement functions. The reliability and quality of software will reduce by defect. Software defect prediction can improve the efficiency of software development and testing to ensure software quality. Researchers proposed unsupervised defect prediction method and cross-project defect prediction method for defect prediction of unlabeled datasets based on software metric values, and achieved good defect prediction result. These two methods can be quickly applied to engineering practice as they don't need labeled data. However, CLA(Clustering and LAbeling) unsupervised method ignores the influence of the difference of the metric value and its threshold. Cross-project defect prediction method doesn't use label data in the calculation of the project distribution similarity.This paper proposes PCLA(Probabilistic Clustering and LAbeling) unsupervised defect prediction method and Cross-project defect prediction method based on PCLA. The main contents are as follows:1. Proposed PCLA unsupervised defect prediction method, evaluates the probability of the class' s defect by mapping the difference of the metric value and its threshold to probability, and then predicts class by clustering and labeling.2. Analyzed the prediction performance of PCLA by the calculation of metric threshold, the parameter a of the Sigmoid function and the parameter O of the calculating of the label threshold.3. Compared the performance of PCLA and CLA. The result show that PCLA method averagely improves the Recall rate, Precision rate, F-measure by 4.58%, 2.56%, 3.25% respectively on 7 datasets of Net Gen and Relink.4. Compared different models on cross-project defect prediction method. Apply 10 prediction models on 15 project, total 50 versions datasets and the result show that the SimpleLogistic model achieved the best performance.5. Proposed Cross-project defect prediction method based on PCLA, which uses PCLA to label the target dataset, then calculates the distribution similarity between Optional source projects and target project Finally, training defect prediction model via source project to predict target project.6. Compared 9 strategies that use to calculate distribution similarity. The result show that using the percentage of label data for calculating distribution similarity achieved best performance. Compared the Cross-project defect prediction based on PCLA with PCLA method, the result show that the Cross-project defect prediction based on PCLA improves the success rate by 200%.The methods proposed for defect prediction for unlabeled dataset in this paper can be quickly applied to engineering practice, especially for new projects. These methods is helpful to improve software quality.

Keywords/Search Tags:

Software engineering, Defect prediction, Unlabeled, Unsupervised, Cross-project

PDF Full Text Request

Related items

1	Research On Cross-Project Software Defect Prediction
2	Research On Software Defect Prediction Method Based On Training Data Selection
3	Software Defect Prediction Strategy Design For Imbalanced Data
4	Cross-Project Software Defect Prediction Methods Based On Autoencoder
5	Software defect prediction on unlabeled dataset
6	Design And Implementation Of Instance Selection Based Ensemble Cross-project Defect Prediction Method
7	Cross-project Software Defect Prediction Based On Machine Learning
8	Research On Some Key Technologies Of Software Defect Prediction
9	Research On Data Preprocessing And Integrated Forecasting Methods In Cross-project Software Defect Forecasting
10	Research On Cross-project Software Defect Prediction By Transfer Learning