Font Size: a A A

Research On Clustering Analysis Algorithm For Real Time Data Stream

Posted on:2018-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:W R JiaFull Text:PDF
GTID:2348330515957465Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,real-time data flow has become one of the most important data model of data information,which has been widely used in network traffic control,data monitoring system,Internet banking and other areas.How to quickly and effectively from the real time data of high speed,large flow of effective information extraction has become a major challenge in data mining.And clustering analysis is an important technique in data mining,this article mainly analyses the algorithm of real-time data stream clustering.Traditional clustering analysis algorithm mainly deals with static data information processing.Because of the characteristics of high speed,real-time and persistence of real-time data stream,the traditional clustering analysis algorithm can not use the traditional clustering analysis algorithm.Researchers have proposed a variety of clustering for data flow analysis algorithm,the clustering algorithm can best embody the high density grid based on real-time.But there are some problems such as the low precision of grid boundary processing,the single grid structure and the lack of grid dynamic adjustment.This article makes improvements on its deficiencies,and propose a clustering algorithm based on density and grid for real time data stream--DSG-Stream.The algorithm is based on the two stage processing framework: the on-line layer dynamically forms the initial micro cluster,and the offline layer obtains the final clustering result through the macro clustering.The algorithm uses a different granularity meshing strategy will be divided into internal mesh grid and boundary grid by grid in clusters: the location of the internal cluster grid with grid coarse-grained grid factor based on grid clustering,boundary processing fine-grained.It also includes the detection and treatment of the isolated grid dynamic adjustment,grid density threshold,so as to improve the efficiency of the algorithm.In order to further improve the processing efficiency,we also designed the algorithm model in distributed environment.DSG-Stream algorithm has high clustering accuracy,strong real-time and stable running state through experimental comparison and analysis.Finally,the DSG-Stream algorithm is applied to the optical fiber protection monitoring system.The results show that the algorithm performs well in the system.
Keywords/Search Tags:real time data stream, clustering analysis, data stream mining, density grid, boundary grid
PDF Full Text Request
Related items