Font Size: a A A

Research On Data Stream Clustering Algorithm Based On Grid And Density

Posted on:2011-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:J F DingFull Text:PDF
GTID:2178330332460048Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The problem of the clustering of data mining is difficult in the context of data stream mining. This is because the large volumes of data arriving quickly and continuously in a stream render most traditional clustering algorithms too inefficient. The traditional clustering algorithms that can not meet the application requirements greatly restrict the application and development of data streams. Therefore, the research on data stream clustering algorithms adapting to the characteristics of data streams has great practical significance.Something about the traditional clustering algorithms and data stream clustering algorithms are discussed in this thesis, and the advantages and disadvantages of various algorithms are analyzed. On the basis of these, Grid and density based clustering over data stream algorithm GDClu is proposed in this thesis. It is a framework based on grid and density on the basic of traditional clustering algorithms, key techniques of data streams and the popular data stream clustering algorithms. The algorithm refers to the framework of CluStream algorithm, the process of data stream clustering is divided into on-line layer and offline layer. On-line layer rapidly reads arriving data in stream, and each datum is mapped into corresponding grid cell and region to form related statistical information and it is stored in characteristic vector of the grid cell as synopsis data structure. Furthermore, the modified pyramid time frame without redundancy is used to store snapshot of synopsis information periodically to meet the requirements of users for history information clustering and evolution analysis of data stream. Offline layer provides accurate clusters of the synopsis information from on-line layer according to the density of the grid cell. The two layers work together to achieve the balance of accuracy and speed. In order to improve the quality of clustering, the subdivided region in the grid cell works as the minimal clustering unit. As long as there is DENSE region in a grid cell that is adjacent with the DENSE region in the fringe of cluster, the grid cell belongs to the cluster. Therefore the other algorithm is also proposed in this thesis that decides whether two DENSE regions in adjacent grid cells are adjacent. The algorithm is used to decide whether there are direct adjacent DENSE regions in the grid cell of fringe of cluster and its adjacent grid cell to support the GDClu algorithm more effectively.The experimental results show that the GDClu algorithm is capable of discovering any number and arbitrary shape of clusters and can effectively eliminate the noise data with high efficiency and quality. In addition, the algorithm is a high efficiency clustering algorithm for stream data mining with good application prospects.
Keywords/Search Tags:Data mining, Data stream, Clustering Algorithm, Grid and Density
PDF Full Text Request
Related items