Font Size: a A A

The Research And Application Of Clustering Feature Selection Methods

Posted on:2010-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:2178360278475467Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is the principal task of pattern recognition using the features of the patterns. Generally, a pattern can be correctly classified only when the one's features have enough classification information. In order to improve the accuracy of classification, the large features need to be collected, so that the original feature space is thousands or tens thousands dimensionalities. This will not only lead the dimensionality of the pattern to enlarge, but also lower the classification accuracy owing to the relativity and redundancy of the features. This is the so-called Curse of Dimensionality. So, in order to effectively analyze high dimensionality data, it is a pivotal step to reduce their dimensional members.The purpose of this paper is to explore a new feature selection way and propose a feature ranking method to reduce feature's dimensionalities. In this paper, the principle of reducing feature's dimensionalities is briefly introduced, and the principal ways of feature dimensionality reduction is reviewed. It also expatiates on features analysis and clustering effectiveness criteria and focus on how to judge the effectiveness of clustering and calculate the similarities among the features. This paper focuses on establishing the rule which basing on clustering of features, and then describing the principles and methods about using this criterion to sort the features. To this issue about feature selection, this paper bases on how the features impact classification results, the metric formula used to compute the similarity of the features and feature analyses ,first using the formula to obtain the matrix about the similarity of the features ,and then according to what each feature impacts the result of classify. Using clustering algorithms and the effectiveness of clustering's criteria and so on, finally putting forward an algorithm basing on clustering features.In the previous chapters introduce the background of the related knowledge roughly, such as K-means clustering algorithm,hierarchical clustering method and the effectiveness of clustering's criteria etc. Aiming at feature selection, based on clustering, a novel feature ranking approach is proposed. A simplified approach is introduced to deal with unsupervised data. At last, the algorithm proposed in this paper is realized by C++, and many datasets is used to experiment. A lot of experimental results demonstrate the validity, feasibility and advantage over others of our approach.
Keywords/Search Tags:Feature selection, Correlation Analysis, Similarity measure, hierarchical clustering, K-means clustering algorithm, Unsupervised
PDF Full Text Request
Related items