Research On Data Stream Clustering Based On Grid And Density

Posted on:2010-02-20

Degree:Master

Type:Thesis

Country:China

Candidate:M Li

Full Text:PDF

GTID:2178360275953317

Subject:Computer software and theory

Abstract/Summary:

In recent years,because of the rapid development of computer and application technology,people's ability of obtaining data improves greatly.Data Stream is a type of important data source,and is subjected to more and more concern.Stream data is a kind of continuous,ordered,changing fast and huge amount data.It is quite a new object that is different from traditional static data stored on the disk.Currently,data mining on data stream becomes a hot research field.Clustering data stream is one of the hottest research points on it.One target on this thesis is to design and develop a data stream clustering algorithm,which is accuracy and high-speed.In order to reach this,we have done some work as follows.Background and relevant work on data stream mining is discussed.Popular clustering algorithms are summarized.The characteristics of data stream and key technical points on data stream clustering are researched.On the basis of these,we propose GDE-Stream(Grid and Density based Evolving Stream) algorithm,which is a framework based on grid and density.By modifying the synopsis data structure,This algorithm has the following characteristics.1.Borrowing the framework from CluStream algorithm,GDE-Stream is divided into online layer and offline layer.Online layer reads data stream rapidly,and stores relative information by synopsis data structure.With this,offline layer provide accurate clustering.The two layers work together to achieve the balance of accuracy and speed.2.The system preserves the characteristics of data stream by grid.In addition to summary statistics information,Grids also record the spatial information of data stream,which can reduce lose of information.3.On online layer,with the spatial information in Synopsis data structure, Online-Read algorithm compare the distances between the riew record and relative grids and map it to correct grid,which can solve the problem of the loss of information On the edge of grid partly.4.On offline layer,Density-based clustering algorithm is used,so that the system is sensitive to the datasets of arbitrary shape.The system can also satisfy the need of clustering and evolution history data stream,with the concept of grid frame and evolution difference.Experiments on both synthetic datasets and real dataset shows that the algorithm is applicability and accuracy and can cluster data stream efficiently.

Keywords/Search Tags:

Data Stream, Clustering, Two-tier Framework, Grid, Density

Related items

1	Research And Apply On Algorithms For Clustering Data Stream
2	Research And Improvement On Stream Data Clustering Algorithm
3	Research On Data Stream Clustering Algorithm Based On Density Grid
4	Research And Application Of Data Stream Clustering Algorithm In The Analysis Web Access Log
5	Research On Data Stream Clustering Based On Density And Grid
6	Data Stream Clustering Algorithm Based On Active Grid-density
7	Research On Clustering Method Of Datastream Based On Grid And Density
8	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
9	Research On Grid And Density Based Data Stream Clustering Algorithm
10	Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization