Research On Data Stream Clustering Algorithm Based On Density Grid

Posted on:2012-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y Mi

Full Text:PDF

GTID:2218330338467517

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Data mining means to extract or "mining" knowledge from large amounts of data. More specifically, obtain essential characteristics and universal laws which behind the data by analysis the data. As a very important data mining method, clustering has been widely used in various fields. Clustering is a process which divides the collection of physical or abstract objects to different object classes using some kinds of similarity criterion, objects which in the same class have some similarities. By clustering, the correlation between global distribution model of the data and object properties can be found, it is interesting.In recent years, with the development of computer and communications technology, a large amount of data stream is generated among the industries. This kind of data has the following features:high flow speed, unlimited number of data, changes dynamically, unpredictable. All these features limit the clustering on data stream. Many scholars have done a lot of research on clustering data stream, but there are still many outstanding areas for improvement.Clustering method based on grid and density has many special advantage compared with other method, for example, high computing speed, finding clusters with arbitrary shape, these characteristics are suitable for clustering on data stream. Density threshold is a crucial parameter to clustering algorithm based on grid and density, which affects the quality of the algorithm significantly. However, general user's lack of domain knowledge and prior information about the data can hardly determine the parameter. In this thesis, the method of average density is used to determine the grid density threshold, through the analysis on grid density of initial data distribution. In data stream processing, the density threshold is adjusted dynamically to adapt to the characteristic that data stream changes dynamically. A common problem in grid-based clustering method is that it is difficult to find the cluster boundary precisely. The reason is the original information about data is discarded and operation only on grids in grid-based method. To improve the accuracy of cluster boundary, store the information of data moderately and subdivide the grids in the boundary. In most grid-based clustering algorithm, the process of cluster formation use random sequence generation, produces a large number of small cluster, it dose not make sense. To solve this problem, choose the grid unit which has highest density as the starting point to form cluster, this helps to find the original structure of the cluster.On basis of previous research, a data stream clustering algorithm is proposed which based on improving the D-Stream algorithm. Result of experiments on artificial and real data demonstrates that our algorithm got good clustering quality.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Data Stream, Density Grid, Non-uniform Division Grid

PDF Full Text Request

Related items

1	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
2	Research On Clustering Analysis Algorithm For Real Time Data Stream
3	Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization
4	The Research Of Clustering Algorithm Based On Data Stream
5	Data Stream Algorithm With Non-uniform Grid And Its Application In Traceability System
6	Research On Data Stream Clustering Algorithm Based On Grid And Density
7	Research On Data Clustering Based On Grid
8	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
9	Research On Clustering Method Of Datastream Based On Grid And Density
10	Research On An Effective Self Adapted Grid-Density Based Clustering