Font Size: a A A

Research On Uncertain Data Streams Clustering Algorithm Based On Tuple Cluster Feature

Posted on:2011-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:D P LiangFull Text:PDF
GTID:2178330338491175Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
By analyzing the algorithms for clustering data streams of foreign and domestic, the existing methods have the following problems. The existing algorithms for clustering uncertain data streams can not cluster the heterogeneous data streams with uncertainty. Besides, in uncertain data streams, the recent data can not be analyzed in detail in the clustering process. Furthermore, most algorithms can not obtain the clustering results with arbitrary shape. For these problems, this paper mainly facus on the research of clustering uncertain data streams over sliding windows. Sloving these problems is useful for the application of Located-based service, E-commerce and so on.Firstly, HU-Clustering, an algorithm for clustering heterogeneous data streams with uncertainty, is proposed to cluster heterogeneous data streams with uncertainty. HU-Clustering designs a probability frequency histogram to track the statistics of categorical attributes. Simultaneously, heterogeneous uncertainty clustering feature is defined to describe the distribution character of heterogeneous data streams with uncertainty. In order to improve the clustering quality, two phase clustering selection is presented. The clustering quality of HU-Clustering is higher than that of UMicro and the execution efficiency is better.Secondly, SWCUStreams, an algorithm for clustering uncertain data streams over sliding windows, is proposed in order to analyze the recent data. This algorithm adopts two phased-based clustering frameworks. In the online component, uncertainty temporal clustering feature is defined to describe the uncertainty information of tuple. In addition, the exponentional histogram of uncertainty clustering feature EHUCF is proposed to store the distribution character of recent data. In the offline component, UK-means is used to generate the final clustering results based on the EHUCF. The clustering quality of SWCUStreams is better.Lastly, on the basis of grid probability density, GD-CUStreams is presented to cluster uncertainty data streams over sliding windows to obtain the clusters of arbitrary shape. GD-CUStreams defines uncertainty grid clustering feature to collect uncertainty information and store the summary information of grid. When the clustering request is submitted by users, the type of grid is judged according to grid probability and finally all grids meeting the requirement are outputted as the clustering results.
Keywords/Search Tags:Uncertain data stream, clustering, sliding window, heterogeneous attributes, probability frequency histogram, grid probability density
PDF Full Text Request
Related items