Software Defect Prediction Based On Semi-supervised Learning And Voting Theory

Posted on:2018-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:Q He

Full Text:PDF

GTID:2428330590477764

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Software defect prediction is an important software quality assurance technique.It utilizes historical project data and previously discovered defects to predict potential defects.However,most of existing methods assume that large amounts of labeled historical data are available for prediction,while in the early stage of the life cycle,projects may lack such data needed for building supervised learning based predictors.In addition,most of existing techniques do not consider cost-effectiveness performance,which makes it not applicable in real defect prediction settings.In this paper,we take these two issues into consideration,and propose a semi-supervised and voting theory based defect prediction approach.The approach extends the classical supervised Random Forest algorithm by self-training and boosting paradigm.Voting theory is also adopted to decide the rank of software modules.Each decision tree in Random Forest can be treated as a voter,and a final defect-prone rank of software module is decided based on a certain decision making rule.The contributions of our work can be summarized as the followings:a)We first adopt the self-training paradigm.An initial model is trained on the limited labeled data samples,and then it is used to predict remaining unlabeled data samples.Then we select those confident data samples out,add them to initial training set.The expanded training set is used to refine the initial model.b)We introduce a boosting process into the proposed approach to decide the weight of each base learner iteratively.c)The final prediction result is based on the weighted decision making rules.Each random tree in Random Forest is regarded as a voter.The priority of these software modules is predicted,instead of a module to be defective or not.A series of experiments are conducted for evaluating the proposed approach and the effectiveness under different labeled sampling rate is compared.Experimental results show that the proposed approach outperforms those supervised learning based approach and unsupervised learning based approach.And the proposed approach trained with a small size of labeled dataset achieves comparable performance to some supervised learning approaches trained with a larger size of labeled dataset.Among all the three decision making rules,CO rule performs best when applied to the proposed approach.When increasing the labeled sampling rate,performance of the proposed approach has a negligible promotion.

Keywords/Search Tags:

Software Defect Prediction, Semi-supervised Learning, Rank of Software Defective Modules, Voting Theory

PDF Full Text Request

Related items

1	Research On Software Defect Prediction Method Based On Semi-supervised Integration
2	Research On Semi-supervised Learning Based Software Defect Prediction
3	Research On Software Defect Prediction For Cross-version Software
4	Research On Software Defect Prediction Based On Ensemble Learning
5	Software Defect Modeling And Prediction In Resource-constrained Scenarios
6	Research On Machine Learning Based Software Defect Prediction
7	Feature Extraction Based Software Defect Prediction
8	Research On Software Defect Prediction Based On Learning Mechanism
9	Software Defect Prediction Based On Machine Learning
10	Metrics-Based Software Defect Prediction