Font Size: a A A

A Kind Of Data Stream Clustering Algorithm Based On Expand Grid-density

Posted on:2013-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2248330395985997Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of The Times, the information is also increasing greatly. A largenumber of data is generated by each domain, such as financial services, stock trading,electronic commerce, network transmission, intrusion detection, satellite, weather, telecominfrastructure, etc. These data is different from the traditional data, which is mass, changesrapidly and arrives in the form of streams. With more and more abundant information, itbecomes the difficulty and hot spots of research areas that how to get useful knowledge fromthe data streams. So data mining technology as a hot technology came into being. Clusteranalysis is one of the key technologies of data mining, and study on data stream clusteringmethod is based on traditional clustering algorithm.In this paper, based on the complementarity of the algorithm of grid and density, the newclustering algorithm on data stream based on expand grid and density is presented. Becausethe results of clustering algorithm based on density can be various shapes. However,calculation of density-based clustering is more complex. Although the clustering algorithmbased on grid unlike the method based on density of high quality, but it can summarize data toquickly clustering with a simple method for dividing and merging grids. Therefore, combiningthe two can achieve better clustering effect. In this paper, when dividing grid units, madeconcept of expand grid based on extending original gird unit, and joined the effect of expandregional of points on the grid unit for calculation of the grid unit density, to avoid the situationof losing the effective information in data space based on the method which directly using thenumber of data points in grid unit as the gird unit density, to implementation efficientclustering on boundary points. At the same time, artificial set of density threshold requiresusers to have background knowledge in related fields.This article presents adaptive densitythreshold calculation method, to adapt to the dynamic changes of the data flow, therebyreducing the burden on users. Then this article using sliding window mechanism, based on theconcept of optimizing merge rules on connectivity of grid density-based, to made the clusterinitialization and update algorithm framework and implementation on the structure of theexpand grid.In this paper, many comparison experiments have been done on window sliding step settings, grid number settings, quality and efficiency of the clustering, etc. Experimentalresults show that the algorithm can achieve better clustering quality and high real-timeclustering efficiency.
Keywords/Search Tags:Data Stream, Expand Grid, Density, Adaptive
PDF Full Text Request
Related items