Font Size: a A A

Research On Subspace Cluster Algorithms On Simil Arity And DBSCAN

Posted on:2014-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y BaiFull Text:PDF
GTID:2268330422466815Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High-dimensional data clustering is very important research field in data miningcurrently, the traditional clustering algorithms are not suitable for high-dimensional dataclustering, then subspace clustering method as an effective high-dimensional clusteringmethod, has been widely applied to the financial, telecommunications, bio-medicine andother fields. Although scholars have provided a lot of subspace clustering algorithms, butthem are not a good solution to the problem of the subspace cluster quality and time.Firstly,we will proposes a novel algorithm AReSUBCLU, an Effective SubspaceClustering Algorithm based on Attribute Relativity and DBSCAN. Active attribute isdefined to reduce the dimension of data, and a subspace search tree which is constructedby attribute correlation matrix is defined to generate subspaces. This method usesDBSCAN with the technique of weighted threshold to cluster on every active attribution.And the correlation matrix is constructed by computing the covariance of attributes. Andthe subspace search tree is constructed by the matrix. The set of nodes in each root branchof the tree is the interesting subspaces that the algorithm discovers. The subspace clustersare obtained through merging every cluster of active dimension by similarity of cluster.Secondly, the spatial tree clustering algorithm based on DBSCAN is proposed. Thealgorithm uses DBSCAN to generate one-dimensional clusters on each active attribute,and the dimension entropy is designed to select the appropriate splitting attribute. Thedataset is split into smaller datasets based on the clusters of the splitting attribute, and thenspatial tree is constructed by calling algorithm recursively until termination conditionsatisfied. The algorithm gets the subspace clusters by space tree structure. The collectionof dataset attribute is consisted of the nodes of each branch of spatial tree and the set is theinteresting subspace the algorithm finds. The leaf nodes on the branch contain the clusterin the subspace. The time consumption of the algorithm reduces greatly.Finally, the experiments are implemented by Eclipse with java programminglanguage. The experimental results show the good performance.
Keywords/Search Tags:subspace cluster, active attribution, similarity, subspace search tree, DBSCAN
PDF Full Text Request
Related items