Font Size: a A A

Software Defect Prediction Based On Semi-supervised Learning And Voting Theory

Posted on:2018-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q HeFull Text:PDF
GTID:2428330590477764Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software defect prediction is an important software quality assurance technique.It utilizes historical project data and previously discovered defects to predict potential defects.However,most of existing methods assume that large amounts of labeled historical data are available for prediction,while in the early stage of the life cycle,projects may lack such data needed for building supervised learning based predictors.In addition,most of existing techniques do not consider cost-effectiveness performance,which makes it not applicable in real defect prediction settings.In this paper,we take these two issues into consideration,and propose a semi-supervised and voting theory based defect prediction approach.The approach extends the classical supervised Random Forest algorithm by self-training and boosting paradigm.Voting theory is also adopted to decide the rank of software modules.Each decision tree in Random Forest can be treated as a voter,and a final defect-prone rank of software module is decided based on a certain decision making rule.The contributions of our work can be summarized as the followings:a)We first adopt the self-training paradigm.An initial model is trained on the limited labeled data samples,and then it is used to predict remaining unlabeled data samples.Then we select those confident data samples out,add them to initial training set.The expanded training set is used to refine the initial model.b)We introduce a boosting process into the proposed approach to decide the weight of each base learner iteratively.c)The final prediction result is based on the weighted decision making rules.Each random tree in Random Forest is regarded as a voter.The priority of these software modules is predicted,instead of a module to be defective or not.A series of experiments are conducted for evaluating the proposed approach and the effectiveness under different labeled sampling rate is compared.Experimental results show that the proposed approach outperforms those supervised learning based approach and unsupervised learning based approach.And the proposed approach trained with a small size of labeled dataset achieves comparable performance to some supervised learning approaches trained with a larger size of labeled dataset.Among all the three decision making rules,CO rule performs best when applied to the proposed approach.When increasing the labeled sampling rate,performance of the proposed approach has a negligible promotion.
Keywords/Search Tags:Software Defect Prediction, Semi-supervised Learning, Rank of Software Defective Modules, Voting Theory
PDF Full Text Request
Related items