Font Size: a A A

The Research On The Algorithms Of Grid-based Data Stream Clustering

Posted on:2013-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:H D WangFull Text:PDF
GTID:2248330362972203Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data stream clustering is a very important problem in data mining. The purpose of datastream clustering is finding clusters in a large, noisy, fuzzy and random data stream andmakes sure that the similarity of the data in the same cluster is as high as possible and in thedifferent clusters is as low as possible. The grid-based data stream clustering of the presentclustering methods has a good performance in all clustering algorithms because of its highdata compression ratio and low time complexity. But the grid-based data stream clustering hasits own flaw that it is easy to lose data points on the edge grids. This shortcoming reduces thecorrectness of grid clustering. Clustering in high dimensional data stream is another problem.All the distances between data points in high dimensional data streams are almost equal, thatbrings a huge difficulty to the algorithms which choose the distance of the data points as theevaluation criteria of similarity.The main research content and results are as follows.First, this paper introduces a grid-based data stream clustering based on variable densitythreshold. In the traditional grid data stream clustering algorithms, we usually use fixedthreshold density and divide the grid evenly, the variable density threshold is used in VDTS.In this method, the grids in the center of the cluster are easily merged and grids on the edge ofthe center are hard to merge. The VDTS method not only contains the character of high datacompression ratio and low time complexity but also keeps the data points on the edge of thecluster.Second, high dimensional data stream clustering is a hard field in data clustering. In thispaper, we introduce a high dimensional data stream clustering algorithm. In high dimensionaldata stream clustering, a critical problem is how to reduce the dimension, which is to say how to choose the dimension which is critical to the clustering. In HVDTS we choose the criticaldimension by measure the distributed of data points projected on each dimension. After thehigh dimensional data is turned into a low dimensional data stream, we use the VDTSalgorithm to finish the clustering.
Keywords/Search Tags:data mining, data stream, clustering, grid, high dimensional
PDF Full Text Request
Related items