Font Size: a A A

Research On Grid-based MST Data Stream Clustering Algorithm

Posted on:2010-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:X P WangFull Text:PDF
GTID:2178360272979364Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important research topic in the field of data mining. With fast development of information technology these years, there exists more widely used dynamic flow data—data stream. Being different from traditional static data stored in disc, data stream is a high speed and massive set which is continuous, dynamic and fast changing thus the visit to it can only be ordinal, one time or several limited times. These characteristics of data stream bring great difficulty to data mining and much higher requirements of clustering algorithm for data stream. In the field of data mining, data stream has been a research focus recently; meanwhile, data stream clustering analysis has become an important topic in clustering research.This thesis first introduces the theories and techniques of data stream mining, and gives an analysis of characteristics of data stream on the combination of streaming data and traditional data. Moreover, comparison and research are made between existing traditional clustering algorithm and data stream clustering algorithm and get advantage and deficiency between them. Then it presents the grid dividing method used in clustering algorithm and its effect in clustering analysis as well as the research and analysis of grid-based clustering algorithm. On this basis, a new kind of clustering algorithm for data stream—GTSClu algorithm is proposed. It is the minimum spanning tree data stream clustering algorithm based on grid, which is divided into online processing and offline clustering, combining with grid and minimum spanning tree techniques. The online part acquires data stream information by means of even grid dividing data space while offline part carries out clustering analysis on the information obtained online through dividing the grid space into uneven grid and the minimum spanning tree technique. Thus, GTSClu algorithm can not only find clusters with arbitrary shape and amount, but also deal with noise data effectively, the efficiency and quality of clustering is improved.The experimental results show that GTSClu algorithm is capable of discovering arbitrary shape of cluster and is insensitive to the sequence of data. Furthermore, the application of grid dividing technique can make the algorithm have high efficiency and accuracy of clustering, and it can distinguish noise data effectively. The algorithm is suited to large-scale data stream processing.
Keywords/Search Tags:clustering analysis, data stream, grid, minimum spanning tree
PDF Full Text Request
Related items