The Research And Application Of Clustering Feature Selection Methods

Posted on:2010-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Chen

Full Text:PDF

GTID:2178360278475467

Subject:Computer application technology

Abstract/Summary:

Classification is the principal task of pattern recognition using the features of the patterns. Generally, a pattern can be correctly classified only when the one's features have enough classification information. In order to improve the accuracy of classification, the large features need to be collected, so that the original feature space is thousands or tens thousands dimensionalities. This will not only lead the dimensionality of the pattern to enlarge, but also lower the classification accuracy owing to the relativity and redundancy of the features. This is the so-called Curse of Dimensionality. So, in order to effectively analyze high dimensionality data, it is a pivotal step to reduce their dimensional members.The purpose of this paper is to explore a new feature selection way and propose a feature ranking method to reduce feature's dimensionalities. In this paper, the principle of reducing feature's dimensionalities is briefly introduced, and the principal ways of feature dimensionality reduction is reviewed. It also expatiates on features analysis and clustering effectiveness criteria and focus on how to judge the effectiveness of clustering and calculate the similarities among the features. This paper focuses on establishing the rule which basing on clustering of features, and then describing the principles and methods about using this criterion to sort the features. To this issue about feature selection, this paper bases on how the features impact classification results, the metric formula used to compute the similarity of the features and feature analyses ,first using the formula to obtain the matrix about the similarity of the features ,and then according to what each feature impacts the result of classify. Using clustering algorithms and the effectiveness of clustering's criteria and so on, finally putting forward an algorithm basing on clustering features.In the previous chapters introduce the background of the related knowledge roughly, such as K-means clustering algorithm,hierarchical clustering method and the effectiveness of clustering's criteria etc. Aiming at feature selection, based on clustering, a novel feature ranking approach is proposed. A simplified approach is introduced to deal with unsupervised data. At last, the algorithm proposed in this paper is realized by C++, and many datasets is used to experiment. A lot of experimental results demonstrate the validity, feasibility and advantage over others of our approach.

Keywords/Search Tags:

Feature selection, Correlation Analysis, Similarity measure, hierarchical clustering, K-means clustering algorithm, Unsupervised

Related items

1	Research On Feature Selection Algorithm Based On Similarity
2	Research On Feature Selection Method Based On Clustering Ensemble
3	New Non-hierarchical Clustering Objetives And The Algorithms To Optimal Clustering
4	Research On Non-IID K-means Clustering Algorithm
5	Research For Feature Selection Algorithm Based On Text Clustering
6	Research On Optimization Methods For Kernel K-means
7	Study On The Application Of The Improved K-means Clustering Algorithm In Image Retrieval
8	Research And Application Of Max-Correlation And Mix-Redundancy Unsupervised Feature Selection
9	Interactive Features And Adaptive Clustering Algorithm
10	Algorithms Research Based On Multiple Hierarchical Clustering