Font Size: a A A

Research On Software Defect Prediction Based On Active Learning

Posted on:2020-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:F F WuFull Text:PDF
GTID:2428330575454947Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of software technology,the use of software products has penetrated into all aspects of society.Therefore,software quality assurance is espe-cially important for large-scale software projects.If the defects in the software cannot be effectively detected and repaired in time,it may not only cause huge economic losses,but may even endanger human life.The software defect prediction guides the quality assurance personnel to allocate resources reasonably by predicting the defect tendency of the software module.That is,the quality assurance personnel can fully test the module with high defect tendency,thereby reducing the waste of resources caused by testing the defect-free module.However,the construction of the defect prediction model requires a large number of reliable training data.If a software project lacks effective training data,it is difficult to establish an efficient defect prediction model.In view of the lack of effective training data,in recent years,researchers have proposed to use the active learning to directly select instances from the target project to mark up the training set.Most of the existing methods are based on the selection of the most uncertain instance strategy in active learning.However,when the initial training set is inconsistent with the overall data in distribution,the method tends to further aggravate the distribution deviation of the training set.If the selected instances are not enough,it is difficult to construct a correct prediction model on the training set.And the problem of class imbalance exists in the defect prediction data set.In the process of selecting instances,the active learning does not deal with the imbalance of data distribution in the constructed training set caused by the problem,which leads to the bad performance of defect prediction model built on the training set.This paper proposes the DAL method and the BDAL method to solve the above problems respectively.DAL is based on the dual uncertainty sampling strategy to com-prehensively evaluate the uncertainty of the instances from different feature subspaces,avoiding the sampling bias caused by the absolute single uncertainty in the total feature space.BDAL mitigates the class imbalance problem in the training set by synthesizing instances of Minority class.The main contributions of this paper are summarized as follows:1.A within-project defect prediction method DAL based on active learning is proposed.In order to solve the problem of lack of effective training data in the field of soft-ware defect prediction,this paper proposes DAL based on double-uncertainty sampling strategy in active learning,which aims to construct a higher quality training set with the least mark cost.This paper introduces the motivation of the proposed DAL and the application process of the method in the field of defect prediction.Then the empirical research on AEEEM and Relink verifies the ef-fectiveness of the DAL.Finally,the advantages and disadvantages of the DAL are discussed.2.The BDAL is proposed for further improvement of the DAL.Aiming at the class imbalance problem in the defect prediction data set,this pa-per proposes a BDAL based on the FS-BSMOTE over-sampling strategy for the further improvement of the DAL.BDAL mitigates the class imbalance problem in the training set by synthesizing instances of Minority class,which can effec-tively improve the recall of defective modules.This paper verifies the improved performance of the BDAL through empirical research on the AEEEM.
Keywords/Search Tags:Software Defect Prediction, Active Learning, Uncertainty Sampling, Data Imbalance
PDF Full Text Request
Related items