Research On Subspace Clustering Algorithm For High Dimensional Data

Posted on:2016-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:L X Wang

Full Text:PDF

GTID:2308330461471343

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The traditional clustering algorithm for data clustering is generally based on distance, While the distance is almost no difference with the increasing of sample dimension. So the traditional method of distance based clustering in high dimensional space is facing a huge challenge,however, the method mentioned above isnâ€™t applied in some local space, so looking for a high dimensional clustering algorithm effectively has become one of the research directions. At present the high dimensional clustering algorithm has the most commonly used two methods: clustering by using traditional method, one hand, high dimension after dimension reduction process cluster. The method uses the reduction algorithm based high dimension data operation to eliminate the redundant dimensions, leaving only a high degree of interest subspace, and then use the clustering algorithm based on traditional distance cluster in subspace. The other is the selection of the feature subset, clustering in Subset space;The Algorithm whose expenses is larger than the others searchs for the subspace whose information content is higher in the high dimensional space.The method needs to traverse each dimension. According to the above the two cases respectively, the following work has been done.In this paper, In the third chapter, Isomap algorithm is selected after the dimension reduction of space points, then the improved k- means algorithm clusters. The compactness of the data will be strengthened through eliminate outlier according to the difference of the distance similarity and the initial clustering center will be selected scientifically.After the data reduced to lower dimensional to be clustered which has a lot of requirements, not universal.so in the fourth chapter, the method is an improved CLIQUE clustering algorithm in high dimensional space a dimension reduction method based on Gini value is proposed to preprocess high-dimensional data through the analysis of the problems existed in the traditional clustering algorithm based on distance to reduces the dimension of data. The CLIQUE algorithm is improved at the same time through dividing dense cell in adaptive grid method with constraints to guarantee that intensive unit will not divided into two clusters and the effective database D contains dense cells in backup. The improved algorithm is better than the original one in the search speed and clustering accuracy.

Keywords/Search Tags:

subspace clustering, high-dimensional clustering, data mining, Cluster analysis

PDF Full Text Request

Related items

1	Study On High-dimensional Data Subspace Clustering Analysis And Application
2	Research On Clustering Algorithms For High-Dimensional Data
3	Research On Subspace Clustering Algorithms Based On Density
4	Research On Improved Subspace Clustering Algorithm
5	The Research On Subspace Clustering For High Dimensional Data
6	Research On Clustering Algorithem For High Dimensional Data
7	Research On Subspace Clustering Algorithms For High-dimensional Data
8	Research And Application Of Soft Subspace Clustering Algorithms
9	Research On Clustering Analysis And Its Applications In Telecom
10	A New High-dimensional Data Clustering Algorithm Based On GAs