Font Size: a A A

Adaptive Evolving Data Stream Algorithm Based On Time Decay Window

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z K TangFull Text:PDF
GTID:2518306491485474Subject:Master of Engineering Computer Technology
Abstract/Summary:PDF Full Text Request
Data in real world is constantly generated,accompanied by more and more massive data streams that need to be processed in practical projects.Since most of the data in the real world is unlabeled,we need an efficient unsupervised learning algorithm to automatically process and identify data streams.Clustering is a method of unsupervised learning to divide a set of objects into classes composed of similar objects,clustering algorithm needs to recluster continuously,Therefore,unsupervised data stream clustering algorithm is proposed to solve the evolving data stream.Data stream has the following characteristics: Rapidity,Irreproducibility,Evolutionary,Temporality,so the clustering algorithm is put forward the following conditions: Efficient processing speed,Mining new trends and new categories,Determining the best number of clusters,Reducing the impact of noise,Processing the impact of time on data.Data stream algorithm usually has two modes: online mode and offline mode,online mode is responsible for receiving and storing data,offline mode is responsible for processing and mining data.Work mainly focuses on the following two aspects:Firstly,the online mode of the data stream algorithm is improved.The parameters in the online mode of the data stream have been set manually all the time,which is easily affected by experience and the evolution of data time series.Therefore,a parameter adaptive method is proposed,which automatically updates the parameters according to the results of offline data analysis,so that the parameter setting can adapt to the evolution of stream data.Secondly,the offline mode of the data stream algorithm is improved.In order to solve the problem that the offline mode clustering algorithm artificially set the number of categories and had a high time complexity,the micro-cluster group was processed by improving the clustering feature(ICF).After the analysis,the ICF-tree generated was used to obtain the number of categories and the clustering results,and the parameters in the online model were updated by the leaf node information.In this thesis,the data stream algorithm model is improved for the problems of missing labels and data evolution in the streaming data environment.Parameters in the model are updated through data analysis results,and ICF-tree is established to improve the data flow processing efficiency on the premise of ensuring the quality of data processing.Experimental results on real data sets and synthetic data sets show that the proposed data stream model algorithm is more accurate and efficient in data stream processing and has stronger anti-noise ability.
Keywords/Search Tags:Data stream mining, Parameter self-adaptation, Data stream clustering, Data evolution
PDF Full Text Request
Related items