Font Size: a A A

Research On Clustering Methods For High Dimensional Data And Their Application

Posted on:2018-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:F SuFull Text:PDF
GTID:2348330518475553Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Cluster analysis is the important content in the field of data mining,w hich is aimed to classify mixed data objects into different clusters,on the basis of their similarity.Since similar objects are grouped together,while dissimilar objects belong to different clusters.With the prevalence of high-dimensional data,and the cause of sparsity and dimensional disaster,the effectiveness of traditional clustering algorithms will be greatly reduced and even be invalid.Therefore,clustering analysis for high-dimensional data has become a hot and difficult research issue.It's an effective way to solve the problem of clustering analysis for high-dimensional data based on dimension reduction.This paper optimized the traditional K-means algorithm firstly,and then applied in dimensionality-reduced data sets for clustering.In order to overcome the problems of traditional K-means algorithm used to have different outcomes and too many iterations of each clustering,which caused by the initial clustering centers are generated randomly from the data set,DPK-means algorithm of optimized initial c lustering centers based on local density was proposed.It combined traditional K-means algorithm with fast searches density peaks of DPC algorithm.Through the calculation of local density i?and distance from points of higher density i?of each point,K points which have the highest local density and distance from points of higher density are chosen as the initial clustering centers,and outliers have higher i?and smaller i?.Then the traditional K-means algorithm is applied to cluster.The dimensionality-reduced algorithms,including PCA?MDS?ISOMAP?LLE,were used to eliminate the redundant dimensions firstly.The dimensionality-reduced feature subsets are measured by evaluating the performance of clustering algorithm.Secondly,DPK-means algorithm is applied to cluster.Used the standard UCI data sets and Motion-segment data set as the contrast experiment objects,the clustering results are evaluated in terms of the clustering quality,iterations,demonstrating that the improved algorithm can enhance the clustering accuracy and stability in d imensionality-reduced datasets.In summary,this paper starts with the concept of data mining,and focuses on the problem of clustering analysis for high-dimensional.With the aid of dimensionality-reduced algorithms,including PCA?MDS?ISOMAP?LLE,DPK-means algorithm is applied to clustering for UCI and Motion-segment data sets.It has better performance than traditional K-means algorithm,but it is necessary to further study the problem of clustering for discrete data and mixed non-spherical data.
Keywords/Search Tags:High-dimensional data, K-means algorithm, DPK-means algorithm, Local density, Initial clustering centers
PDF Full Text Request
Related items