Font Size: a A A

Research Of Feature Selection In Clusters Analysis

Posted on:2013-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q W QinFull Text:PDF
GTID:2248330374456471Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important tool in machine learning and data mining, and clustering plays an important role in discovery the hidden knowledge’ and the internal structure.With rapid development of computer technology, network technology and database technology, data acquisition, transmission and storage are becoming much easier and quicker, thus forming a large number of data which style becoming more and more complex. Complex data makes the current feature selection algorithms and machine learning algorithms are difficult to receive good results. So there is an urgent need for feature selection algorithms and machine learning algorithms which have better accuracy and efficiency.So, in order to effectively analyze high dimensionality data, it is a pivotal step to reduce their dimensional members.The problems of feature selection in large-scale data clustering analysis are locally investigated in this paper. The main contributions of this paper are summarized as follows:(1) Analyzes the problem encountered in large-scale data processing. Summarized and classified the feature selection algorithm in the cluster analysis. Aim at the relationship between the characteristics and categories. On this basis, a neighborhood distance is introduced, and gives a measure of evaluation index which can measure the feature’s ability of clustering.(2) Based on the new feature evaluation index, combined with heuristic search strategy, an algorithm is proposed to find the important features in categorical data. Compared with traditional clustering analysis algorithm, experimental results demonstrate the proposed algorithm is effective in clustering accuracy and time consumption.(3) The proposed feature selection method is applied to semi-supervised clustering analysis; we can also find our method is better than traditional clustering analysis algorithm in clustering accuracy and time consumption.The above obtained contributions further enrich the research in unsupervised feature selection; also provide a new approach and perspective to solve the practical application of the high dimensional complex data clustering analysis.
Keywords/Search Tags:Feature selection, Clustering algorithm, Neighborhood distance, Attributesignificance
PDF Full Text Request
Related items