Font Size: a A A

Research On Subspace Clustering Algorithm Based On Finding Effective Dimension

Posted on:2013-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z G LiuFull Text:PDF
GTID:2248330392954638Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the field of data mining, subspace clustering is an important research hotspot.It has been widely used in many areas. Traditional clustering algorithms clusterdirectly in the original data space, but because of the sparse of the data in the originaldata space, it can not obtain the desired clustering results. To solve the above defect,this paper will focus on the research of the subspace clustering algorithm based onfinding effective space and subspace clustering algorithm based detecting critical gridfor data stream.First, we analyze the requirements of subspace clustering and the techniques fordimension reduction in high-dimensional data, at last, we analyze the classic subspaceclustering algorithms, including the top-down clustering algorithms and bottom-upclustering algorithms, and point out the advantages and disadvantages of thealgorithms.Second, in order to solve the problem that the algorithms based on grid discardthe sparse grids which are part of a cluster, we propose a subspace clustering algorithmbased on finding effective spaces. First, the algorithm finds the effective dimensionsaccording to relative entropy in every dimension. Then, it gets the high-dimensionaleffective spaces through a bottom-up iterative process, in the process, it uses themonotonicity of the clustering criterion to prune candidates, in addition, it also uses apruning function based on undirected acyclic connected graph. At last, it uses aclustering algorithm based on density to identify clusters in the effective spaces.Once more, in order to solve the problem that some clusters share one grid due tothe inappropriate partition of grids, we propose a subspace clustering algorithm basedon detecting critical grids for data stream. The algorithm is composed of the onlinecomponent and the offline component. In the online component, first, it finds theeffective dimensions according to relative entropy in every dimension. Then itgenerates the subspace by the correlation among dimensions and bit vector. At last, itprojects the data into grids and updates the eigenvector of the grids. In the offline component, when clustering request arrives, it computes adaptively the threshold ofthe dense grid, identifies the critical grids according to the centrality and attraction, atlast generates the clustering results.Finally, we implement the above algorithms with java language, and use syntheticdatasets and real datasets to test the clustering quality and scalability of our algorithms,furthermore, we analyse and compare the experimental results.
Keywords/Search Tags:subspace clustering, effective space, effective dimension, critical grid, hash tree
PDF Full Text Request
Related items