Font Size: a A A

Study Of Real Time Data Stream Clustering Based On Damped Window And Pruning Dimension Tree

Posted on:2010-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:W CengFull Text:PDF
GTID:2178360278457591Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Mining real-time data stream is a novel research hotspot in the field of data mining and database. Technique of clustering analysis based on real-time data stream is one of the most challenging problems in this research field. In this thesis, Research background and main research branches in this research, for example, clustering, classification, frequency item mining and association rules analysis, are introduced firstly. The newest development of real-time data stream clustering research is overviewed. It introduces interrelated theory and common techniques of real-time data stream clustering. Then strengths and weaknesses of different-kind representational algorithms are analyzed systematically. Performances of these algorithms are compared subsequently in five aspects: execution speed, shape of cluster, evolving analysis, high-dimension and haleness of noise. Data stream evolving analysis based on clustering and its limitations are presented.In order to solve some problems of present algorithms, such as low processing speed, high system consumption and disabled to arbitrary cluster shape, a novel real-time data stream clustering algorithm, called PDStream, is proposed in this thesis. PDStream is based on damped window and density dimension tree. PDStream firstly divides data space into grids and maps all the data points into the gird space orderly, and then an improved dimension tree structure is used to maintain and update the synopsis data structure of data stream. A periodical pruning strategy is designed to prune the sparse grids in dimension tree periodically. Finally the depth first search method is used to deal with online clustering request. Through comparing clustering results of different time implements the data stream evolving analysis.The experimental results based on synthetic dataset and real dataset demonstrate that the proposed algorithm PDStream can effectively discover clusters of arbitrary shape in data stream at any time. PDStream has the advantages of excellent clustering results, low memory consumption, high processing speed and preferable precision.
Keywords/Search Tags:Mining Data Stream, Clustering Analysis, Damped Window, Density Dimension Tree, Pruning Strategy
PDF Full Text Request
Related items