Font Size: a A A

Software Defect Prediction Strategy Design For Imbalanced Data

Posted on:2019-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y NiuFull Text:PDF
GTID:2428330566976387Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software defect prediction is a hot research topic in software testing.Good prediction strategy can save the test resource,money cost and improve the quality.In this thesis,we focus the following three works associated with software defect prediction:Firstly,there are two crucial problems for software defect prediction: the class imbalance of datasets and parameter settings of support vector machine(SVM).Currently,the scholars mainly focus upon one of them,and this phenomenon may affect the prediction accuracy significantly.In this thesis,a hybrid multi-objective cuckoo search undersampling based on SVM(HOMCS-US-SVM)is proposed aiming to optimize these two problems simultaneously,while probability of false alarm rate and probability of defection are employed as the targets.Furthermore,three different undersampling strategies for class imbalance are designed:(1)samples selected from all non-defect modules uniformly;(2)K-means cluster algorithm is employed to divide all non-defective modules into several clusters,and then samples selected from all clusters uniformly;(3)K-means cluster algorithm is employed to divide all non-defective modules into several clusters,and then samples selected from one cluster with largest modules.To test the performance,eight benchmark datasets are chosen and compared with other eight prediction models.The results show that the proposed strategy three achieves the best performance.Secondly,inspired from the oversampling viewpoint of SMOTE(a well-known oversampling algorithm),we also propose a hybrid multi-objective cuckoo search oversampling based on SVM(HMOCS-SMOTE-SVM).With this method,the neighbor of SMOTE and parameters of SVM are optimized simultaneously.Experiments show that the proposed model can effectively improve the performance of SMOTE.Finally,to tackle the class imbalance problem of the datasets in cross-project software defect prediction,a three-stage data selection prediction model for cross-project problem is designed.In the phase of software project selection,a hybrid similarity measure is proposed to select the similar project.In the phase of instances selection,Burak filter is employed.In the phase of class imbalance,the proposed software defect prediction model(undersampling and oversampling)is employed.The experimental results show that the performance of our proposed models achieves the best performance when compared with other seven prediction algorithms.
Keywords/Search Tags:Software defect prediction, Class imbalance, SVM, Cross-project software defect prediction
PDF Full Text Request
Related items