Adaptive Evolving Data Stream Algorithm Based On Time Decay Window

Posted on:2022-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z K Tang

Full Text:PDF

GTID:2518306491485474

Subject:Master of Engineering Computer Technology

Abstract/Summary:

PDF Full Text Request

Data in real world is constantly generated,accompanied by more and more massive data streams that need to be processed in practical projects.Since most of the data in the real world is unlabeled,we need an efficient unsupervised learning algorithm to automatically process and identify data streams.Clustering is a method of unsupervised learning to divide a set of objects into classes composed of similar objects,clustering algorithm needs to recluster continuously,Therefore,unsupervised data stream clustering algorithm is proposed to solve the evolving data stream.Data stream has the following characteristics: Rapidity,Irreproducibility,Evolutionary,Temporality,so the clustering algorithm is put forward the following conditions: Efficient processing speed,Mining new trends and new categories,Determining the best number of clusters,Reducing the impact of noise,Processing the impact of time on data.Data stream algorithm usually has two modes: online mode and offline mode,online mode is responsible for receiving and storing data,offline mode is responsible for processing and mining data.Work mainly focuses on the following two aspects:Firstly,the online mode of the data stream algorithm is improved.The parameters in the online mode of the data stream have been set manually all the time,which is easily affected by experience and the evolution of data time series.Therefore,a parameter adaptive method is proposed,which automatically updates the parameters according to the results of offline data analysis,so that the parameter setting can adapt to the evolution of stream data.Secondly,the offline mode of the data stream algorithm is improved.In order to solve the problem that the offline mode clustering algorithm artificially set the number of categories and had a high time complexity,the micro-cluster group was processed by improving the clustering feature(ICF).After the analysis,the ICF-tree generated was used to obtain the number of categories and the clustering results,and the parameters in the online model were updated by the leaf node information.In this thesis,the data stream algorithm model is improved for the problems of missing labels and data evolution in the streaming data environment.Parameters in the model are updated through data analysis results,and ICF-tree is established to improve the data flow processing efficiency on the premise of ensuring the quality of data processing.Experimental results on real data sets and synthetic data sets show that the proposed data stream model algorithm is more accurate and efficient in data stream processing and has stronger anti-noise ability.

Keywords/Search Tags:

Data stream mining, Parameter self-adaptation, Data stream clustering, Data evolution

PDF Full Text Request

Related items

1	Research Of Evolving Data Stream Clustering
2	Study On Data Stream Techniques And Its Application In Electric Power Information Processing
3	Research On Industrial Data Stream Oriented Parameter Adaptive Real-time Clustering Algorithm
4	Research On An Application Of Data Stream Query And Data Stream Mining In Oil Field
5	Research On Data Stream Clustering And Its Applications Based On Correlations
6	A Density-Based Clustering Algorithm Over Stream Data
7	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
8	Research On Dynamic Measurement Based Data Stream Clustering And Its Applications
9	Data Stream Clustering Algorithm Analysis Using A Flock Of Agents
10	Study On Clustering Algorithm Adapt To High-speed Data Stream