Font Size: a A A

Density Tree-based Clustering Algorithm For Uncertain Data Streams

Posted on:2016-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z D XingFull Text:PDF
GTID:2348330542475888Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the research on traditional data stream clustering and further study on the uncertainty of data,scholars have proposed clustering algorithms for uncertain data streams.Due to the introduction of uncertainty description,traditional data stream clustering algorithm is not fully applicable to the uncertain data stream clustering.Clustering technology is faced with higher requirements and new challenges.This paper researches some uncertain data stream clustering algorithms that proposed by previous scholars,and summarizes their advantages and disadvantages.Given the existence of the low utilization rate of grid,mesh and density threshold issue in the uncertain data stream clustering algorithm based on density and grid,this paper proposes and implements an uncertain data stream clustering algorithm based on the density tree UD-Tree(Uncertain Density Tree).The algorithm adopts the processing frame of CluStream algorithm,the clustering process will be divided into online and offline process.online process deal with the continuous coming uncertain data quickly,the different attributes corresponds to different layers of the tree,and each layer the tree has the same partition,so that the uncertain data can be mapped to different leaf nodes,forming density tree structure.This method and eliminates the empty grid largely and improves the utilization rate of space and the effect of clustering result.According to the principle that recent data is more important,time regression model is adopted,and concept of probability density of leaf nodes is proposed based on the model,the probability density leaf feature vectors is used to stored information of uncertain data,because of the setting of update cycle and isolated leaf node function,the amount of calculation is reduced and the efficiency of algorithm is improved.offline process gives a more accurate clustering for the summary information of online process,according to the probability density of the leaf node,the leaf node can be dense,transition or sparse,merging the dense leaf nodes that are adjacent to form clusters.The experiments on real datasets Forest CoverType and KDD Cup 1999 data proved that the proposed algorithm has improvement over EMicro algorithms on accuracy and quality of clustering.
Keywords/Search Tags:uncertain data stream, clustering, density tree, leaf node
PDF Full Text Request
Related items