Based On Sliding Window And The Grid Density Data Stream Clustering Algorithm Research

Posted on:2013-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Ou

Full Text:PDF

GTID:2248330374488295

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent decades, the problem of data stream clustering has attracted much attention and has been extensively studied by many international domestic academics. However, there are still many issues unsolved, such as the problem of efficiency of algorithm, storage space and precision of clustering. Based on the research of common clustering algorithms for data streams and the analysis of advantages and disadvantages of the algorithms, this paper proposes optimized data stream clustering algorithms.A majority of data stream is characterized by high-dimensional, it is therefore inefficient to cluster the crude data streams due to its inherently complexity. Besides, the data stream has the characteristic of evolution and people tend to be more concerned about the recent data. This paper proposes an effective clustering algorithm referred as HSWStream for high dimensional data stream over sliding window. First, the algorithm uses the technique of projected clustering to descend dimension, with the efficiency improved. Second, the algorithm eliminates the influence of historical data with the technology of sliding window and maintains the summary information of clusters with exponential histogram of cluster feature. Finally, in order to bring more efficiency, the method of maintaining exponential histogram has been improved. The experimental results demonstrate that HSWStream has higher quality, less memory and better scalability than HPStream.Most of the existing grid-based data stream clustering algorithms are able to deal with the clusters with arbitrary shape, but deficiencies still exist. Actually, the data space is partitioned by many grids. Some grids on cluster edge are often regarded as sporadic ones because they contain less data points. As a result, the accuracy is reduced due to the incomplete information on cluster edge. This paper proposes a data stream clustering algorithm based on grid density and force called F-Stream. Using the CluStream double-layer framework, this algorithm divides the clustering process into online component and offline component. The online component maps new data points to the corresponding grids and updates the characteristic vectors of grids, while the offline component clusters the grids. The algorithm adopts a cluster boundary treatment technique, which is based on grid force, to improve the accuracy of clustering and make the cluster edge smoother. To judge whether the grid on cluster edge is a sporadic grid, the algorithm introduces both the distance and the density of grid as the judgment standard. The computation of grid centroid uses weighted mean instead of arithmetic mean. Consequently, the position of grid centroid can represent the space distribution of data points indirectly. Moreover, the incremental updating of the centroid saves a lot of time and space. The experimental results indicate that the F-Stream has higher clustering quality and speed compared with the CluStream. Due to the implement of boundary treatment technique, it avoids the loss of cluster boundary information and has higher accuracy than D-Stream.

Keywords/Search Tags:

data stream, clustering, sliding window, grid density, gridforce

PDF Full Text Request

Related items

1	Research On Data Stream Clustering Algorithm Based On Density Grid Over Sliding Window
2	Research On Density Data Stream Clustering Algorithm Based On Sliding Window
3	Research On Data Stream Clustering Algorithm Based On Sliding Windows And Subspace Partition
4	Research On Density-based Subspace Clustering Algorithm For Data Streams
5	Research On Density-Based Subspace Clustering Algorithm For Data Streams
6	Algorithm For Clustering Data Streams Based On Density Units Covered
7	Research On Uncertain Data Streams Clustering Algorithm Based On Tuple Cluster Feature
8	Research On Clustering Of Stream Data Based On Sliding Window
9	Study Of Cluster Algorithm Of Data Stream
10	Research On Data Stream Clustering Algorithm Based On Density Grid