Font Size: a A A

Feature Selection Method Based On Incremental Clustering And ReliefF

Posted on:2012-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y TongFull Text:PDF
GTID:2178330335956046Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and the rise of Artificial Intelligence, pattern recognition is more widely used. It is usually to be collected a huge number of original features when people carrying out pattern recognition. It makes the dimension of the original feature space to reach thousands or even hundreds of, greatly reducing the recognition rate and recognition accuracy. Feature selection is a very important and critical step for pattern recognition, and also plays a very important role in terms of classification decisions. It directly affects the recognition results.In this dissertation, we focus on the Relief feature selection methods based on a comprehensive analysis of existing feature selection methods. Relative to other feature selection methods, Relief has a distinct advantage whether its time cost or restrictions on the data type. Therefore, in this dissertation we choose ReliefF algorithm which is the improved algorithm of Relief as the starting point of my study. ReliefF algorithm is a supervised learning feature selection methods depends on the type of label. Using clustering as the basic tool for feature selection does not need to rely on category labels, more suitable for large data sets, and the data-type is not binding, could be applied to any case of data reduction needs. If we combine clustering and ReliefF together, we can achieve the purpose of dealing large-scale data sets using ReliefF in the case without category labels.We study the feature selection methods which combine incremental clustering and ReliefF. From the study we found that incremental clustering algorithm and ReliefF algorithm had deficiencies. If we combine these two methods simply, their problems are not able to be resolved. In this dissertation, we propose some improved strategy based on a comprehensive analysis of their problems. There are three improved strategy: (1) determine the cluster radius by setting two adjustable parameters; use the least distance principle to divide dataset into hyper spheres with almost the same radius; (2)calculate the value of information entropy when the number of cluster is different, select the number of cluster which has the smallest information entropy; (3)propose a solution to deal with the mixed attribute redundancy; use correlation coefficient and mutual information method to calculate the correlation between features; delete redundant features. We propose a feature selection method which is based on incremental clustering and ReliefF named ICB-ReliefF.In order to verify the efficiency of ICB-ReliefF, we do some contrastive experiments on UCI data sets. The experiment results show that ICB-ReliefF method is improved markedly compared with the existing methods on classification accuracy and feature subset size.
Keywords/Search Tags:Feature selection, Relief, Incremental clustering
PDF Full Text Request
Related items