Feature Selection Based On Cost Sensitive Learning For Software Defect Prediction

Posted on:2013-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Chen

Full Text:PDF

GTID:2248330395452738

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

To improve efficiency and minimize cost of the software testing process, it is important to estimate software module’s defect-proneness. Such defect-prone software modules may cause software failures, increase development and maintenance costs. Accordingly, many methods of machine learning and data mining are applied to identify the defect-prone modules. Such a process is usually called as software defect prediction.To software datasets, three practical issues should be considered:(1) The number of defect-prone software modules is usually much smaller than that of not-defect-prone software modules.(2) Original software feature set usually contains irrelevant features and redundant features, which may confuse the learning algorithm.(3) It is difficult, time-consuming and extremely expensive to collect enough labels of software modules. So, this paper proposes three feature selection algorithms for software defect prediction considering the above issues:1. Propose a global feature selection algorithm based on cost sensitive SVM (FS-CSSVM). It is a feature ranking algorithm, which sorts all the software by AUC, calculated by cost sensitive classifier CSSVM. The experimental results on real-world software datasets show that the selected features by FS-CSSVM are more effective for software defect prediction.2. Propose a local feature subset selection algorithm based on cost sensitive SVM (FSS-CSSVM). For features from different category (LOC, Halstead, McCabe), we use sequential backward feature selection based on cost sensitive SVM to remove redundant features by mutual information. The experiment on NASA datasets shows the effectiveness of FSS-CSSVM.3. Propose a semi-supervised feature selection algorithm based on cost sensitive Laplacian SVM (FS-CSLapSVM). It is also a feature ranking algorithm, which removes the irrelevant software features. Moreover, it both considers structure information of unlabled software modules by Laplacian SVM and class imbalance by cost sensitive learning. Experimental results on NASA datasets show the validity of FS-CSLapSVM.

Keywords/Search Tags:

software defect prediction, feature selection, SVM, cost sensitive, imbalance

PDF Full Text Request

Related items

1	Feature Selection Based On Cost Sensitive Learning For Software Defect Prediction
2	Cost-Sensitive Feature Selection Algorithms With Application In Software Defect Prediction
3	Research On Software Module Defect Prediction Method In Fire Maintenance System
4	Research On Software Defect Prediction Method Based On Feature Selection
5	Research On Software Defect Prediction Algorithm Based On Cost-sensitive Learning
6	Research And Application Of Feature Selection For Software Defect Data
7	Research On Software Defect Prediction Method Based On Cost Sensitive Learning
8	Research On High-dimensional Data Processing In Software Defect Prediction
9	Research On Software Defect Prediction Method Based On Fusion Feature Selection And Ensemble Learning
10	A Cost-sensitive Hybrid Software Defect Prediction Model