Font Size: a A A

The Research Of Clustering Algorithm Based On Data Stream

Posted on:2011-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HeFull Text:PDF
GTID:2178360308467895Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, new aggregate of the data called data stream applies in extensive areas following the rapid developmental information technology, its data are dynamic, ordered, consecutive and infinite, the data was treated only according to ordered access and read once or limited several times. In our lives, data in sensor, stock price information, network transmission and monetary securities analysis are frequent data stream.It introduces clustering analysis methods of data stream based on the data mining in this paper.it analyzes the strengths and weaknesses of the existing algorithms, and combines with data stream characteristics of real-world and the practical application, proposes clustering algorithm research of data stream about noisy data. It includes several parts in general:(1) Research on traditional and classical data stream of clustering algorithm:it includes requirement, classification and contrast of traditional clustering algorithm;character and requirement of clustering algorithm of data stream, and analysis and contrast of several kinds of classic algorithm; it can build base for the clustering algorithm research of data stream further.(2) Detailed introduction on clustering algorithm framework of the dual-tier data stream: The on-line layer algorithm can be applied into the treatment of newly valid point simple and fast,formation and storing of outline data information; The off-line layer algorithm can be applied into calculating the on-line outcome input according to relatively complex and efficient clustering algorithm, and get the high quality clustering outcome.The research on data stream clustering based on grid and density in this paper used the dual-tier data stream clustering framework.(3)According existed problems of traditional clustering algorithm, it presents a data stream clustering algorithm GDDStream which based on grid and density in this paper. The algorithm uses the online layer/offline layer data stream clustering framework. The on-line layer algorithm is to map into the new layer of rapid real-time data, that is according to each data object's property values to position it to the corresponding grid unit; The off-line layer algorithm adaptively adjusts the clustering with density changing and continuously updates the grid cell of the feature vector and merges cells or class cluster together based on the density. In order to improve the quality and speed of clustering algorithm, it effectively deals with noise point data, and dynamic changes can be distinguished by the density of real data and remove noise points. It makes smaller storage space and the algorithm will decrease the work, so it greatly improves the efficiency of the algorithm.At last, we did some experimental simulation on the GDDStream algorithm proposed in this paper, it showed the algorithm has better scalability, faster processing speed, higher clustering quality and clustering of cluster in any form. Intrusion detection represented by the specific environment of data stream for the study in application part. It generally discusses GDDStream algorithm, and initially proposes intrusion detection method.
Keywords/Search Tags:Data Mining, Grid, Density, Clustering Analysis, Data Stream
PDF Full Text Request
Related items