Research Of Feature Selection In Clusters Analysis

Posted on:2013-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:Q W Qin

Full Text:PDF

GTID:2248330374456471

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Cluster analysis is an important tool in machine learning and data mining, and clustering plays an important role in discovery the hidden knowledge’ and the internal structure.With rapid development of computer technology, network technology and database technology, data acquisition, transmission and storage are becoming much easier and quicker, thus forming a large number of data which style becoming more and more complex. Complex data makes the current feature selection algorithms and machine learning algorithms are difficult to receive good results. So there is an urgent need for feature selection algorithms and machine learning algorithms which have better accuracy and efficiency.So, in order to effectively analyze high dimensionality data, it is a pivotal step to reduce their dimensional members.The problems of feature selection in large-scale data clustering analysis are locally investigated in this paper. The main contributions of this paper are summarized as follows:(1) Analyzes the problem encountered in large-scale data processing. Summarized and classified the feature selection algorithm in the cluster analysis. Aim at the relationship between the characteristics and categories. On this basis, a neighborhood distance is introduced, and gives a measure of evaluation index which can measure the feature’s ability of clustering.(2) Based on the new feature evaluation index, combined with heuristic search strategy, an algorithm is proposed to find the important features in categorical data. Compared with traditional clustering analysis algorithm, experimental results demonstrate the proposed algorithm is effective in clustering accuracy and time consumption.(3) The proposed feature selection method is applied to semi-supervised clustering analysis; we can also find our method is better than traditional clustering analysis algorithm in clustering accuracy and time consumption.The above obtained contributions further enrich the research in unsupervised feature selection; also provide a new approach and perspective to solve the practical application of the high dimensional complex data clustering analysis.

Keywords/Search Tags:

Feature selection, Clustering algorithm, Neighborhood distance, Attributesignificance

PDF Full Text Request

Related items

1	Research On Robust Fuzzy Clustering Algorithm Based On Feature Selection
2	Research On Feature Selection Method Based On Neighborhood Rough Set
3	Feature Selection Of Information Systems Based On Neighborhood Toleranc Rough Sets
4	Research On Feature Selection Method Based On Three-way Decisions Theory And Feature Clustering
5	Research And Application Of Spectral Clustering Algorithm
6	Research On Online Streaming Feature Selection Algorithms
7	Study On Feature Selection Based On Neighborhood Rough Set
8	Research On Neighborhood Detector Generating Algorithm And Model Based On Clustering
9	Research On The Gene Selection Based On Neighborhood Mutual Information
10	The Research And Application Of Clustering Feature Selection Methods