Font Size: a A A

Research Of Evolving Data Stream Clustering

Posted on:2021-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:J Q PengFull Text:PDF
GTID:2428330623968532Subject:Engineering
Abstract/Summary:PDF Full Text Request
As an important branch in the field of data mining,data stream mining has always been a research hotspot,and some existing researches have also produced great significance and value in theory and applications.The key feature of data stream mining is to extract valuable knowledge in real time from a massive,continuous,and dynamically evolving data stream with one scan.However,most algorithms are based on data with complete labels,and have strong assumptions about the evolution form of the data stream(such as concept drift,concept evolution,feature evolution,etc.),which greatly limits the application of data stream mining in many real-world scenarios.Therefore,it is an important task in the field of current data stream mining to establish a reliable adaptive clustering algorithm,which can be effectively applied to scenarios with missing tags,and that it can quickly adapt to multiple evolutionary forms.The main work of this paper includes adaptive unsupervised learning of conceptual evolutionary data streams and feature evolution of complex data streams.The main innovations of this paper are mainly in the following three aspects:Firstly,to address the problems of poor clustering performance due to inappropriate sliding window size(or decay rates)in the conceptual evolutionary data stream,and clustering results that do not correctly reflect the current data distribution.This paper first proposes the concept of cluster life cycle,and a cluster life cycle learning(CLL)algorithm is proposed to make effective predictions for the time when each class appears and disappears in the data stream.The algorithm adjusts the decay rate of the weights of the micro-clusters by adaptively learning a forgetting function for each weighted microcluster,accelerating the weight decay of the outdated concept micro-clusters and slowing down the weight decay of the micro-clusters that correctly reflect the current data distribution,which is effective To improve the performance of clustering.Secondly,the traditional window-based clustering evolution detection method has the problems of untimely detections and missed evolutionary detection.Based on adaptive learning of clustering life cycle,this paper proposes a dynamic clustering algorithm based on minimum spanning tree.The algorithm monitors the evolution of the cluster structure online by maintaining the tree structure and the relationship between trees in real time,which improves the timeliness and accuracy of cluster evolution detection.Thirdly,in order to solve the complex optimization problem of feature evolution scene in the data stream and existing methods do not consider the problem of concept drift and feature evolution at the same time,this paper proposes a feature evolving learning framework based on micro-cluster structure: FEMC.The basic idea is to learn a weight vector on the retained features from the perspective of metric learning,and compress the information contained in the disappeared features onto the retained features,so that the model on the original feature space is still valid.Combined with the lazy learning model,FEMC is applied to data stream classification and clustering tasks respectively.FEMC provides effective support for improving the reliability of learning and the adaptability of learning algorithms on the data stream.
Keywords/Search Tags:data stream mining, concept evolution stream, feature evolution stream, data stream clustering
PDF Full Text Request
Related items