Font Size: a A A

The Improvement Of Streaming Data Oriented Clustering Algorithm And Its Realization Of Servicelization

Posted on:2021-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ZhangFull Text:PDF
GTID:2428330611480620Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
With the rapid development of industrial informatization and sensor networks,continuous and real-time data streams are generated in many fields such as network monitoring,industrial control,stock trading and Internet communication.A large amount of valuable information is contained in the streaming data.Data mining technology has become a hot research object in the field of streaming data because it can find the effective information in the massive data.Among them,clustering analysis on real-time stream data is one of the main hotspots in data mining.Through clustering,the data set can be divided into several subsets called clusters or categories,so that the objects in the same cluster have as much similarity as possible and the objects in different clusters have as much homogeneity as possible.Clustering analysis is used to divide the data set reasonably,which is helpful to identify the hidden pattern information,abnormal data and fluctuation events in the stream data group.The Clu Stream algorithm proposes a two-stage clustering framework for scanning stream data in a single pass.In the online update phase,the microcluster snapshot is used to store the clustering profile information,and in the offline analysis phase,the pyramid time frame is used to respond to clustering requests of different granularity.However,it does not consider the influence of historical data in the window partition and cluster structure update,so it cannot reflect the difference in the importance of old and new data.At the same time,the fixed total number of microclusters also leads to some defects in the processing of the feature evolution of class cluster,which fails to reflect the splitting and fusion of class cluster in time.In this paper,an improved clustering algorithm based on Clu Stream is proposed to improve the accuracy and performance of the original algorithm while effectively identifying new and old clusters of different classes,and a servitization model for clustering analysis of stream data is proposed to effectively solve the problem of scaling and expansion of stream data mining analysis.The main research work and contributions of this paper are as follows:1.To solve the problem that Clu Stream does not consider the influence of historical data on the weight,the attenuation function is introduced and the periodic adaptive iteration strategy is added to dynamically adjust the global microcluster structure to improve the accuracy and performance of the algorithm.2.Aiming at the problem of algorithm performance under the traditional standalone architecture,this paper proposes a real-time clustering algorithm for streaming data based on the distributed environment,which can effectively improve the clustering analysis performance of large-scale data streams and shorten the overall mining time.3.Through the servitization modeling of the convective data clustering,a prototype system for the streaming data clustering service is realized,which can support the sharing and expansion of the streaming data and provide the composite monitoring function of the services.
Keywords/Search Tags:streaming data, clustering, real-time computing, distributed system
PDF Full Text Request
Related items