Font Size: a A A

Cost-Sensitive Feature Selection Algorithms With Application In Software Defect Prediction

Posted on:2013-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:L S MiaoFull Text:PDF
GTID:2248330362970908Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the growth of high-dimensional data which is accumulated from production, communicationsand research domains, the state-of-the-art machine learning can not efficiently produce useful results.Feature selection can effectively reduce data dimensionality by removing irrelevant and redundantfeature, speed up data mining algorithms and improve mining performance such as predictiveaccuracy and result comprehensibility. Traditional feature selection does not take into account ofclass-imbalance problem and cost-sensitive problem. However, there are many class-imbalanceproblems and cost-sensitive problems in the real world. Therefore, we firstly study class-imbalanceproblem and cost-sensitive problem, and propose cost-sensitive feature selection algorithms. Then weimprove the cost-sensitive laplacian score which is based on adjacency graph. Three contributions andmain research work of this thesis are summarized as follows:Firstly, by incorporating cost information into traditional feature selection algorithms in a waysimilar to cost-sensitive learning, we develop three cost-sensitive feature selection algorithms, namelyCSVS, CSLS and CSCS. They successfully address the high-dimensionality problem, cost-sensitiveproblem, class-imbalance problem in the phase of feature selection. Experimental results on UCIdatasets and NASA software defect prediction benchmark datasets demonstrate their efficacy.Secondly, we propose iterative cost-sensitive laplacian score to address the problem that theadjacency graph is artificially constructed in advance and unchanged during the feature selection. Byupdating the adjacency graph which cost-sensitive laplacian score based on, iterative cost-sensitivelaplacian score can evaluated the importance of features effectively. Experimental results on UCIdatasets and NASA datasets can validate the effectiveness of the proposed method.Finally, we propose a software defect prediction model based on double cost-sensitive learning,which takes account of cost information in both feature selection an learning phase. This model canaddress class-imbalance and cost-sensitive problems of software defect prediction effectively.Experimental results on NASA software defect prediction benchmark dataset demonstrated theefficacy of our proposed method.
Keywords/Search Tags:cost-sensitive feature selection, software defect prediction, iterative feature selection, cost-sensitive learning
PDF Full Text Request
Related items