Font Size: a A A

Research On Distributed Stream Data Clustering Algorithm Based On Density And Tilted-Time Window

Posted on:2020-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:G G ZhangFull Text:PDF
GTID:2428330578961740Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,the fields of stock trading,real-time monitoring of road conditions and network intrusion detection have produced a large number of data streams that arrives continuously according to time and are dynamic change in real time.Since the data stream is different from the traditional static dataset,the clustering algorithms that are used to mine the traditional static dataset can no longer effectively cluster it.Therefore,according to the characteristics of data stream,experts have developed many data-stream-oriented clustering algorithms.DenStream,a density-based data stream clustering algorithm,has been widely used due to its ability to find clusters of arbitrary shapes and handle outliers effectively.However,the algorithm does not support distributed parallel computation and the evolution analysis of real-time data stream in the specified time window,so further improvement is needed to improve the performance of the algorithm.To solve the problem that the DenStream does not support the distributed parallel computation,a distributed data stream clustering algorithm D-DenStream is proposed.The algorithm is divided into three steps: initialization,online microcluster maintenance and offline cluster,the part of online microcluster maintenance includes two phases: real-time update at local points and mergence at global point.Distributed parallel computation is implemented at local points to update microclusters parallelly in real time.Merging microclusters is implemented at global point to obtain the global microclusters.And then,D-DenStream is deployed to Storm cluster to improve the processing efficiency of the algorithm.Finally,experiments are designed to verify clustering quality and processing efficiency of the algorithm.The experimental results show that the D-DenStream algorithm has the clustering quality similar to the DenStream algorithm,but the processing efficiency is improved by two times.To solve the problem that the DenStream does not support the evolution analysis of real-time data streams in the specified time window,a data stream clustering algorithm TTW-DenStream based on tilted-time window is proposed.the tilted-time window model is applied to the algorithm to achieve evolution analysis of real-time data streams in the specified time window.Then distributed implementation scheme of TTW-DenStream algorithm is proposed and deployed to Storm cluster to improve the processing efficiency of the algorithm.Finally,experiments are designed to verify the effectiveness of the algorithm.Experimental results show that TTW-DenStream algorithm can cluster data streams in real time,and the clustering results can achieve evolution analysis.This paper applies TTW-DenStream algorithm to the analysis of taxi hotspot.The experiment on theGPS dataset of Beijing taxi shows that the clustering results support evolution analysis,and the taxi hotspots can be found.In conclusion,this paper studied and improved DenStream that is a density-based data stream clustering algorithm,then deployed the improved algorithm to Storm,a distributed real-time computing system with low latency,high fault tolerance,high reliability and scalability,to improve the processing efficiency of the algorithm.Finally,experiments were designed to verify the advantages and effectiveness of the improved algorithm,and TTW-DenStream algorithm is applied to the analysis of taxi hotspots.
Keywords/Search Tags:Data stream clustering, DenStream, Distributed, Storm, Evolution analysis
PDF Full Text Request
Related items