Font Size: a A A

Research On Data Stream Clustering Algorithm Based On Double-layer Grid And Density

Posted on:2015-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2298330422983719Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the late20thcentury, the technology of data acquisition was increasinglydeveloped. The traditional ways of data extraction was changed with the innovation ofthe existing database and the information expansion. The data stream became themainstream in the form of data. Therefore, how to extract the valuable informationquickly and efficiently gradually becomes the hot topic in the field of data mining.For the dynamic nature of data streams, the clustering of data stream must be adynamic execution, and the data can be processed uninterrupted. Secondly, themanifestation of the mining results should be intuitive and simple. In addition, thestream clustering algorithm should show the dynamic evolution process of the datastream and dynamically maintain the results of clustering, reflecting the timeliness ofthe data stream.Traditional data stream clustering algorithm is based on grid clusters at the gridof same granul-arity, it improves processing speed, but the accuracy of cluster is lower.In this connection, a new data stream clustering algorithm DBG-Stream based ondouble-layer grid and density is put forward. The algorithm uses grids of two differentgranularities to cluster data stream, by learning the idea of CluStream algorithm, itdivides the clustering process into two stages. The first one is that applyingcoarse-grained grid cells to form the initial cluster in the online process, and thesecond one is that on the fine-grained grid cells, making secondary clustering for gridcell located on the boundary cluster in the offline process so as to improve theaccuracy of cluster. At the same time, it enables the automatic setting of keyparameters. Besides, it improves the efficiency of the algorithm by the strategy ofdeleting grid. The results of experiments show that the DBG-Stream algorithmclustering accuracy has greatly improved than D-Stream algorithm, it effectivelysolves the problems of traditional grid-based clustering algorithms. The algorithm candiscover clusters of arbitrary shape.And the algorithm is suitable for large-scaledata stream of knowledge mining.
Keywords/Search Tags:data mining, data stream, cluster, cluster analysis, density, double-layer
PDF Full Text Request
Related items