Font Size: a A A

Research And Improvement On Stream Data Clustering Algorithm

Posted on:2015-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2298330452494374Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and global information, bigdata era has come. People face to obtain useful information from the data to support better decision-making and development. Facing the large amounts of streamingdata, how to obtain knowledge from streaming data is becoming more and more important.Our goal in this paper is to design and implement a data stream clustering algorithm which is of good quality and high efficiency of clustering, the main work Ihave done areas follows: after having the in-depth understanding of the topic, then research the characteristics of the data stream, requirements and technology of clustering data stream; summarizes the advantages and disadvantages of the current classical data stream clustering algorithm; study the clustering algorithm based on grid. Sowe design and implement DD-Stream algorithm which uses double layers structureframework. In online layer, constantly access data points, according to the size of the radius dimension and in accordance with certain rules to divided the data space to form the grid structure, then use the grid structure to storage feature informationIn offline layer, we set the gap as minimum time of a dense grid declining a sparse grid, every gap to check and update the grids, according to the grid density andconnectivity to cluster the grids. Clustering contains initial clustering and adjusting the cluster, the initial clustering only perform the first gap, then later gap to detect the grids which satisfy the deleting condition and to adjust the cluster clustering.Finally, we did some experiments based on both artificial datasets and real datasets on the DD-Stream algorithm, it prove that the algorithm has obtained the satisfactory clustering quality and efficiency and can cluster data stream efficiently.
Keywords/Search Tags:Data stream clustering, Double-layer framework, The grid density, DD-Stream
PDF Full Text Request
Related items