Font Size: a A A

The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream

Posted on:2011-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2178360305465650Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis, as an important task of data mining, has wide application fields. These different applications raise some novel requirements for clustering analysis algorithm.This thesis proposes a novel grid-based parallel clustering algorithm for multi-density datasets, called PGMCLU. The innovative works of it are as follows. Define the concepts, including grid compactness, grid density-connected, grid feature, cluster density and cluster similarity. Propose the method for data partition based on grid partition, the method for local clustering based on grid density-connected concept, and the method for merging local clusters based on cluster similarity measure. Realize the adaptive set for parameter minPts. PGMCLU algorithm can better handle high-dimensional and massive datasets, and can be capable of identifying clusters with distinguished shape and density.Data stream is a sequence composed of a series of infinite, successive, high-speed, and time-ordered data objects. Data stream has the characteristics of real-time and infinity, which determines that clustering algorithm for data stream compared with traditional clustering algorithm for static dataset has some distinguished properties.This thesis proposes the grid-based clustering algorithm for data stream, shorten for GC-Stream. The innovative works of it are as follows. Propose the concept of grid feature vector for describing the grid summary information. Improve the SP-Tree structure, and propose the novel spatial index structure LSP-Tree based on List data structure. Propose the exponential damped strategy for grid information, and the pruning strategy for noisy grid and outdated grid. GC-Stream algorithm can better satisfy the real-time requirement of data stream clustering, and can be adaptive for memory size..Detailed and complete experiments have proved the correctness and effectiveness of PGMCLU and GC-Stream algorithm, therefore, these novel algorithms will have significant theoretic value and practical role.
Keywords/Search Tags:grid, parallelism, clustering analysis, multi-density cluster, clustering analysis for data stream, spatial partition tree based on List data structure
PDF Full Text Request
Related items