Font Size: a A A

Research Of Probability Density Grid-based Clustering For Uncertain Data Streams

Posted on:2012-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:L J ChenFull Text:PDF
GTID:2348330536954198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,clustering algorithms for uncertain data stream have been extensively studied,but there are still many issues to be res earched and resolved.Most existing uncertain data stream clustering algorithms can not generate the final accurate clustering results.And the existing traditional grid-based clustering algorithms used the fixed meshing method have the disadvantage of low clustering accuracy.Simultaneously,they are lack of effective storage structure for probability density grid cells.The solution of these problems has an important influence on optimizing clustering algorithms of uncertain data stream,application and so on.Firstly,in order to clustering uncrtain data streams online,this paper proposes a novel algorithm PDG-OCUStream,Probability Density Grid-based Online Clustering for Uncertain Data Streams,where a count-based sliding window is introduced to reflect the current situation of the uncertain data stream.Meanwhile,in order to achieve initializing and updating clusters,the grid probability density similarity is defined.In addition,in order to obtain clusters of arbitrary shapes,this paper adopts the storage structure based on probability density grid structure,and the clustering quality can be effectively controlled by setting probability density threshold.Secondly,in order to improve the accuracy of clustering uncertain data streams,this paper proposes a novel algorithm APDG-CUStream,Adjustable Probability Density Grid-based Clustering Algorithm.It adopts adjustable probability density grid structure technology to enchance the accuracy of clustering.Meanwhile,the definition of Probability Density Grid Clustering Feature is defined to store the summary information of uncertain data streams.In addition,the time decay factor is introuduced to reduce the influence of outdated data on clustering results.Lastly,in order to store the probability density grid cells effectively,this paper proposes a novel algorithm PDGT-CUStream,Clustering Uncertain Data Streams based on Probability Density Grid-Tree,where a tree summary data structure is introduced to the enviroment of clustering uncertain data streams.Firstly,the uncertain tuples are assigned to a probability density tree,to eliminate the effects impact on the clustering results by the empty grids.Meanwhile,the time interval is setted to reduce the computation and improve the efficiency of algorithm.In addition,the noise threshold function is introuduced to find the noise leafnodes effectively.The feasibility and effectiveness of the above proposed algorithms are verified through experiments.Combining with classical algorithms and methods,they are analyzed and compared.
Keywords/Search Tags:uncertain data streams, clustering, online clustering, adjustable probability density grid, probability density grid tree
PDF Full Text Request
Related items