Font Size: a A A

Learning On Evolving Data Streams

Posted on:2021-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Salah-Ud-DinFull Text:PDF
GTID:1368330647460889Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In today's digital era,a massive amount of streaming data in various real-world applications is automatically and continuously generated.Developing efficient data stream learning algorithms is challenging for the machine learning community because of several unique challenges,such as infinite length and evolving property that a learning algorithm needs to confront.Several algorithms over the past decade have been proposed to solve these problems.However,other more important challenges are ignored or not adequately addressed by existing algorithms.The first challenge is concept evolution,which is defined as the emergence of novel patterns.Traditional classifiers work with a fixed number of classes,however,in many real-world data stream applications,the number of classes may evolve over time.The second challenge is label scarcity,i.e.,existing works often concentrate on a supervised learning framework.However,in practice,labeling every data item in data streams is time and resources consuming.A more realistic situation is that only a few instances in data streams are labeled.Therefore,how to design a reliable semi-supervised learning algorithm is a challenging task.The third challenge is high data dimensionality that can significantly affect the performance of the learning algorithm.This thesis proposes novel learning methods to address these issues.The main contributions are given as follows.1.Considering the concept evolution problem,this thesis proposes a new data stream classification algorithm for detecting and learning novel classes.The proposed algorithm can simultaneously handle both concept drift and concept evolution problems together.Furthermore,the proposed method's main benefit is that it can handle the data streams with complex class distribution and distinguish concept drift and evolution from noisy instances.Extensive experiments on synthetic and real-world data sets show that our method has good classification and novel class detection performance(average 5% improvement)compared to state-of-the-art algorithms.2.In light of the label scarcity problem on data streams,this thesis proposes a new reliable online semi-supervised learning algorithm for evolving data stream classification.The proposed algorithm uses the concept of micro-clustering for datastream classification and semi-supervised learning.Furthermore,an ensemble of k-NN classifiers are employed to provide robust classification.The proposed algorithm works in an online way and adequately handles the incoming streaming data,and can be implemented in devices with low computational resources.Experimental results show that the proposed algorithm supports high classification performance(average 8% improvement compared to others)even with a small amount of labeled data.3.Considering the curse of dimensionality and label scarcity problem,this thesis presents a new semi-supervised learning method for streaming data.In the proposed algorithm,a denoising autoencoder is employed to cure the curse of dimensionality by transforming the high dimensional feature space into a reduced,compact,and more informative feature representation.Furthermore,a cluster-and-label technique is used to reduce the dependency on true class labels.The proposed method employs a synchronization-based dynamic clustering technique to summarize the streaming data into a set of dynamic micro-clusters that are further for classification.In addition,a disagreement-based learning method is employed to cope with concept drift.Experimental results demonstrate that the proposed algorithm achieved better performance(average 6% improvement)compared to many state-of-the-art algorithms.
Keywords/Search Tags:Data stream, Concept drift, New class detection, Semi-supervised classification
PDF Full Text Request
Related items