Font Size: a A A

Research On Subspace Clustering Algorithm For High Dimensional Data

Posted on:2016-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:L X WangFull Text:PDF
GTID:2308330461471343Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The traditional clustering algorithm for data clustering is generally based on distance, While the distance is almost no difference with the increasing of sample dimension. So the traditional method of distance based clustering in high dimensional space is facing a huge challenge,however, the method mentioned above isn’t applied in some local space, so looking for a high dimensional clustering algorithm effectively has become one of the research directions. At present the high dimensional clustering algorithm has the most commonly used two methods: clustering by using traditional method, one hand, high dimension after dimension reduction process cluster. The method uses the reduction algorithm based high dimension data operation to eliminate the redundant dimensions, leaving only a high degree of interest subspace, and then use the clustering algorithm based on traditional distance cluster in subspace. The other is the selection of the feature subset, clustering in Subset space;The Algorithm whose expenses is larger than the others searchs for the subspace whose information content is higher in the high dimensional space.The method needs to traverse each dimension. According to the above the two cases respectively, the following work has been done.In this paper, In the third chapter, Isomap algorithm is selected after the dimension reduction of space points, then the improved k- means algorithm clusters. The compactness of the data will be strengthened through eliminate outlier according to the difference of the distance similarity and the initial clustering center will be selected scientifically.After the data reduced to lower dimensional to be clustered which has a lot of requirements, not universal.so in the fourth chapter, the method is an improved CLIQUE clustering algorithm in high dimensional space a dimension reduction method based on Gini value is proposed to preprocess high-dimensional data through the analysis of the problems existed in the traditional clustering algorithm based on distance to reduces the dimension of data. The CLIQUE algorithm is improved at the same time through dividing dense cell in adaptive grid method with constraints to guarantee that intensive unit will not divided into two clusters and the effective database D contains dense cells in backup. The improved algorithm is better than the original one in the search speed and clustering accuracy.
Keywords/Search Tags:subspace clustering, high-dimensional clustering, data mining, Cluster analysis
PDF Full Text Request
Related items