Font Size: a A A

A Clustering Algorithm For Uncertain Date Stream:UStreamUKm

Posted on:2013-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:G Z LvFull Text:PDF
GTID:2248330395455369Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of stream clustering, many researchers have proposed aclustering of uncertain data stream, which is due to the value of data uncertainty inpractical application.Uncertain data stream has a higher demand on clusteringalgorithm, because uncertain data stream not only retain the features,which data streamhas,such as unlimited, rapid and so on, also has the uncertainty characteristics.After studying the problem of uncertain data stream clustering problem,wecompared the classic uncertain data stream clustering algorithm.This paper presents auncertain data stream clustering algorithm UStreamUKm (Uncertain Stream UncertainK-means). Adapting to characteristics of the uncertain data streams,the algorithm canconstruct the Coreset of uncertain data stream to reduce the data size, reducing theexecution time; at the same time, enhancing the algorithm’s clustering quality byoptimizing the initial cluster centers.The main work are as follows:(1) Using bucketpolicy process the data flow, adapting to the applications environment of Data Stream,constructing Coreset of Uncertain Data Stream, Coreset provide a high amount ofinformation and small-scale samples for the following stage of the clusteringalgorithm.(2)Using the Max-min Cluster Distance Algorithm, MCDA, the initial clustercenters selection method improves the quality of clustering Uncertain DataStream.(3)Introduced handling outliers mechanism in Clustering algorithm, it couldreduce the impact of outliers on the clustering results.Under the premise of efficiency and the limited memory consumption, thetheoretical analysis and the simulation results all show that algorithm can effectivelycluster uncertain data streams.
Keywords/Search Tags:Uncertain Data Stream, Coreset Of Uncertain Data Stream, Optimization of the initial cluster centers, Handling outliers
PDF Full Text Request
Related items