Font Size: a A A

Feature Selection Based On Cost Sensitive Learning For Software Defect Prediction

Posted on:2013-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2248330395452738Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
To improve efficiency and minimize cost of the software testing process, it is important to estimate software module’s defect-proneness. Such defect-prone software modules may cause software failures, increase development and maintenance costs. Accordingly, many methods of machine learning and data mining are applied to identify the defect-prone modules. Such a process is usually called as software defect prediction.To software datasets, three practical issues should be considered:(1) The number of defect-prone software modules is usually much smaller than that of not-defect-prone software modules.(2) Original software feature set usually contains irrelevant features and redundant features, which may confuse the learning algorithm.(3) It is difficult, time-consuming and extremely expensive to collect enough labels of software modules. So, this paper proposes three feature selection algorithms for software defect prediction considering the above issues:1. Propose a global feature selection algorithm based on cost sensitive SVM (FS-CSSVM). It is a feature ranking algorithm, which sorts all the software by AUC, calculated by cost sensitive classifier CSSVM. The experimental results on real-world software datasets show that the selected features by FS-CSSVM are more effective for software defect prediction.2. Propose a local feature subset selection algorithm based on cost sensitive SVM (FSS-CSSVM). For features from different category (LOC, Halstead, McCabe), we use sequential backward feature selection based on cost sensitive SVM to remove redundant features by mutual information. The experiment on NASA datasets shows the effectiveness of FSS-CSSVM.3. Propose a semi-supervised feature selection algorithm based on cost sensitive Laplacian SVM (FS-CSLapSVM). It is also a feature ranking algorithm, which removes the irrelevant software features. Moreover, it both considers structure information of unlabled software modules by Laplacian SVM and class imbalance by cost sensitive learning. Experimental results on NASA datasets show the validity of FS-CSLapSVM.
Keywords/Search Tags:software defect prediction, feature selection, SVM, cost sensitive, imbalance
PDF Full Text Request
Related items