Font Size: a A A

Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization

Posted on:2013-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GuoFull Text:PDF
GTID:2218330362963171Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, data stream clustering has become a hot topic in the field of data mining.Because of data stream having the characteristics of massive, fast changing, temporallyordered, potentially infinite and the limit of the memory, the traditional clusteringalgorithm is not directly applied to data stream, therefore, the research on data streamclustering algorithms adapting to the characteristics of data stream has great theoreticalsignificance and practical significance. This paper focuses on clustering algorithms basedon grid and density over data stream, and solve the problems of clustering accuracy thatdue to without dealing with the boundary points and the "hard division" of girds. Theresearch results of these clustering algorithms have a broad application prospect inwireless sensor, Web analysis, software security, industrial control field.First, a clustering algorithm based on density grid-tree and similarity is proposed. Thealgorithm adopts the two-phased framework structure of Clustream. In the onlinecomponent, each data record is mapped to the grid, and the summary data structure of thetree is adapted to store the empty grids. In the offline component, clustering is conductedbased on density and the similarity is adopted to deal with the boundary points.Second, we propose a clustering algorithm based on nonuniform grid. The data spaceis divided evenly, the isolated grid is defined and the deleted. The sub-grid is built to usethe centroid of the low-density grid as the center, and the dense grids are mergedaccording to depth-first strategy. The neighbor grid of the sub-grid is defined, the densesubgrid is merged to the corresponding clusters according to the minimum distance anddensity similarly between dense subgrid and its neighboring girds.The algorithms of this paper are implemented with C++language. Experimentalresults show that the clustering quality and performance of these algorithms proposed inthis paper are superior to the comparison algorithms.
Keywords/Search Tags:data stream, clustering, density grid-tree, nonuniform grid, simility
PDF Full Text Request
Related items