Font Size: a A A

Research On Data Stream Clustering Based On Storm

Posted on:2018-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2348330542467835Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of science and technology,the popularity of various applications and in-depth,the amount of data generated by the sharp growth,while many of the data is dynamic,streaming,and the need for real-time processing and mining analysis.For data stream clustering,there are many kinds of data stream clustering algorithms.However,there are still many problems in these algorithms,such as can not adapt to the data stream speed change,Large-scale data stream clustering efficiency is not high and distributed parallel conditions,poor quality and so on.In recent years,cloud computing platform is emerging and perfecting,and has good distributed parallel computing capability.Data mining and analysis based on cloud platform technology has also gained extensive attention and recognition,which provides a new way to improve the efficiency of data flow clustering.New ways.However,due to the relatively short development history of Storm,data stream clustering based on it is rare.In this paper,based on the requirements and characteristics of data stream clustering,this paper presents a data stream clustering model based on Storm based on the comparative analysis of multiple cloud computing models.CluStream is used to classify the data stream clustering algorithm.The concept of microclusters density and the sliding window with dynamically adjustable time are put forward.Based on the clustering model,S-CluStream is designed and implemented.In this paper,the online clustering process is divided into local micro-cluster update and global micro-cluster merging under CluStream's two-stage clustering.Therefore,S-CluStream divides the data flow clustering into four processes,namely,the determination of the initial cluster of clusters,the real-time update of local clustering microclusters,the global merging of local clustering microclusters and the clustering of global microclusters,Which realizes the clustering analysis of data stream in real time.In order to test the effectiveness of the Storm-based data stream clustering algorithm,this paper designs and builds a Storm experimental cluster,which can be viewed from the aspects of evolution,clustering quality and clustering efficiency.The experimental results show that the algorithm is evolutionary,and both the clustering quality and the clustering efficiency are improved.
Keywords/Search Tags:Data Stream Clustering, Storm Platform, Data Stream Computing
PDF Full Text Request
Related items