Research On Clustering Methods For High Dimensional Data And Their Application

Posted on:2018-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:F Su

Full Text:PDF

GTID:2348330518475553

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Cluster analysis is the important content in the field of data mining,w hich is aimed to classify mixed data objects into different clusters,on the basis of their similarity.Since similar objects are grouped together,while dissimilar objects belong to different clusters.With the prevalence of high-dimensional data,and the cause of sparsity and dimensional disaster,the effectiveness of traditional clustering algorithms will be greatly reduced and even be invalid.Therefore,clustering analysis for high-dimensional data has become a hot and difficult research issue.It’s an effective way to solve the problem of clustering analysis for high-dimensional data based on dimension reduction.This paper optimized the traditional K-means algorithm firstly,and then applied in dimensionality-reduced data sets for clustering.In order to overcome the problems of traditional K-means algorithm used to have different outcomes and too many iterations of each clustering,which caused by the initial clustering centers are generated randomly from the data set,DPK-means algorithm of optimized initial c lustering centers based on local density was proposed.It combined traditional K-means algorithm with fast searches density peaks of DPC algorithm.Through the calculation of local density i?and distance from points of higher density i?of each point,K points which have the highest local density and distance from points of higher density are chosen as the initial clustering centers,and outliers have higher i?and smaller i?.Then the traditional K-means algorithm is applied to cluster.The dimensionality-reduced algorithms,including PCA、MDS、ISOMAP、LLE,were used to eliminate the redundant dimensions firstly.The dimensionality-reduced feature subsets are measured by evaluating the performance of clustering algorithm.Secondly,DPK-means algorithm is applied to cluster.Used the standard UCI data sets and Motion-segment data set as the contrast experiment objects,the clustering results are evaluated in terms of the clustering quality,iterations,demonstrating that the improved algorithm can enhance the clustering accuracy and stability in d imensionality-reduced datasets.In summary,this paper starts with the concept of data mining,and focuses on the problem of clustering analysis for high-dimensional.With the aid of dimensionality-reduced algorithms,including PCA、MDS、ISOMAP、LLE,DPK-means algorithm is applied to clustering for UCI and Motion-segment data sets.It has better performance than traditional K-means algorithm,but it is necessary to further study the problem of clustering for discrete data and mixed non-spherical data.

Keywords/Search Tags:

High-dimensional data, K-means algorithm, DPK-means algorithm, Local density, Initial clustering centers

PDF Full Text Request

Related items

1	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm
2	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
3	Research On Improvement Of K-means Clustering Algorithm
4	K-means Algorithm For Optimizing Initial Clustering Centers Based On Improved Density Peak
5	Study On Improvement Of K-means Clustering Algorithm
6	Research And Application Of K-means Clustering Algorithm
7	Improvements And Implementation Of K-means Clustering Algorithm
8	PSO-based Spatial Data Clustering Model And Its Application
9	Improved K-means Clustering Based On Genetic Algorithm
10	The Research And Application Of Text Clustering Based On Improved K-means Algorithm