The Research On The Algorithms Of Grid-based Data Stream Clustering

Posted on:2013-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:H D Wang

Full Text:PDF

GTID:2248330362972203

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data stream clustering is a very important problem in data mining. The purpose of datastream clustering is finding clusters in a large, noisy, fuzzy and random data stream andmakes sure that the similarity of the data in the same cluster is as high as possible and in thedifferent clusters is as low as possible. The grid-based data stream clustering of the presentclustering methods has a good performance in all clustering algorithms because of its highdata compression ratio and low time complexity. But the grid-based data stream clustering hasits own flaw that it is easy to lose data points on the edge grids. This shortcoming reduces thecorrectness of grid clustering. Clustering in high dimensional data stream is another problem.All the distances between data points in high dimensional data streams are almost equal, thatbrings a huge difficulty to the algorithms which choose the distance of the data points as theevaluation criteria of similarity.The main research content and results are as follows.First, this paper introduces a grid-based data stream clustering based on variable densitythreshold. In the traditional grid data stream clustering algorithms, we usually use fixedthreshold density and divide the grid evenly, the variable density threshold is used in VDTS.In this method, the grids in the center of the cluster are easily merged and grids on the edge ofthe center are hard to merge. The VDTS method not only contains the character of high datacompression ratio and low time complexity but also keeps the data points on the edge of thecluster.Second, high dimensional data stream clustering is a hard field in data clustering. In thispaper, we introduce a high dimensional data stream clustering algorithm. In high dimensionaldata stream clustering, a critical problem is how to reduce the dimension, which is to say how to choose the dimension which is critical to the clustering. In HVDTS we choose the criticaldimension by measure the distributed of data points projected on each dimension. After thehigh dimensional data is turned into a low dimensional data stream, we use the VDTSalgorithm to finish the clustering.

Keywords/Search Tags:

data mining, data stream, clustering, grid, high dimensional

PDF Full Text Request

Related items

1	Research On Clustering Algorithm Over High Dimensional Data Stream Based On Irregular Grid Data
2	Research On Clustering Algorithm Over High Dimensional Data Stream Based On Grid And Sequence Data
3	A High Dimensional Data Stream Clustering Algorithm Of Quick Dimension Reduction
4	Research On Data Stream Clustering Algorithm Based On Density Grid
5	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
6	Research On Clustering Method Of Datastream Based On Grid And Density
7	Research On Clustering Algorithm Of Data Stream
8	Research And Implementation On Key Techlogy Of Data Stream Mining
9	The Analysis And Application Of Clustering Algorithm For Multi-Dimensional Data Streams
10	The Research Of Clustering Algorithm Based On Data Stream