Font Size: a A A

Research On Density-based Subspace Clustering Algorithm For Data Streams

Posted on:2011-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:S Y CaoFull Text:PDF
GTID:2198330338491107Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
By analyzing the existing methods for subspace clustering, it is discovered that there exist the following problems in the previous algorithms. Firstly, few of the existing methods for subspace clustering aim to cluster data streams. In addition, most algorithms for clustering data streams consider the nature of decaying for the data stream. Finally, the existing density-based algorithms for subspace clustering suffer from the dimensionality bias. In order to address these problems, this paper will focus on the research on data streams subspace clustering based on density. The solution of these problems is meaningful for life sciences, E-commerce, Business Intelligence and so on.Firstly, SDSStream, an efficient method for performing density-based data streams subspace clustering over sliding windows, is presented. A new model of weighted sliding window is proposed in this algorithm. In the weighted sliding windows, the definitions of EHCF and TCF are developed. Potential micro-clusters and outlier micro-clusters are stored in the form of EHCFs and are maintained by the maintenance of EHCFs. A new tragedy of outliers deleting is proposed. The final clusters of arbitrary shape are generated according to all the p-micro-clusters by SUBCLU.Secondly, DS-Stream, a algorithm for subspace clustering data streams over decayed windows, is proposed. A tree structure is presented to mirror the partitioning of the data space and to maintain the synthetic information of the data streams. The idea of density-based clustering is applied into the clustering algorithm. Base on the new data structure, different density thresholds are performed for different dimensional subspaces. A k dimensional cluster is consisted of the consecutive dense grid-cells. This algorithm can get the real clusters even in environment with noise data elements.Finally, the above algorithms are implemented with Visual C++. The accuracy of the clustering and the performance of the method are experimented. The Experimental results show that the clustering quality of SDSStream is higher than the one that CluStream is applied to a special subspace, DS-Stream outperforms the traditional clustering algorithm and can get real clusters even in the environment with noises. The algorithms proposed in this paper solved the problem that the present algorithms suffer from. The expected results are obtained.
Keywords/Search Tags:data stream, subspace clustering, density, sliding window, exponential histogram, decayed window, tree structure
PDF Full Text Request
Related items