Font Size: a A A

Research On Grid And Density Based Data Stream Clustering Algorithm

Posted on:2011-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y R DuFull Text:PDF
GTID:2178330332960372Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years tremendous and potentially infinite volumes of data streams are often generated by real-time surveillance systems, communication networks, web page clicks, on-line transactions in the financial market and other dynamic environments. Unlike traditional data sets, they are temporally ordered, fast changing, massive, and potentially infinite. To discover knowledge or patterns from data streams, it is necessary to develop single-scan, on-line, multilevel, multidimensional stream processing and analysis methods.Many methods on data stream clustering in academia have been proposed, however there are a lot of issues need to be researched and resolved. Generally speaking, the grid-based clustering algorithm, which partitions the data space into a finite number of cells to form a grid structure and then performs all clustering operations on this obtained grid structure, is an effective and efficient clustering algorithm in contrast with other clustering algorithms (such as Density-based method and partitioning method), but its effect or precision is seriously influenced by the size of the cells. To cluster efficiently and simultaneously, to reduce the influences of the size of the cells, a new data stream clustering algorithm on grid and density based, called PGD algorithm, is proposed in this paper. The main idea of PGD algorithm is to cluster the same data stream on two grid structures whose granularity is close to each other concurrently. This paper call the grid with larger granularity as original grid structure and the other as revised grid structure. The revised grid structure can be considered a dynamic adjustment of the size of the original cells, and thus, the clusters generated from this revised grid structure can be used to revise the clusters from the original grid structure for improving the precision. The experimental results verify that, indeed, the effect of PGD algorithm is less influenced by the size of the cells, which is not only effective but also efficient.
Keywords/Search Tags:Data stream, Clustering, Grid, Granularity, Parallel
PDF Full Text Request
Related items