Learning On Evolving Data Streams

Posted on:2021-01-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Salah-Ud-Din

Full Text:PDF

GTID:1368330647460889

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In today's digital era,a massive amount of streaming data in various real-world applications is automatically and continuously generated.Developing efficient data stream learning algorithms is challenging for the machine learning community because of several unique challenges,such as infinite length and evolving property that a learning algorithm needs to confront.Several algorithms over the past decade have been proposed to solve these problems.However,other more important challenges are ignored or not adequately addressed by existing algorithms.The first challenge is concept evolution,which is defined as the emergence of novel patterns.Traditional classifiers work with a fixed number of classes,however,in many real-world data stream applications,the number of classes may evolve over time.The second challenge is label scarcity,i.e.,existing works often concentrate on a supervised learning framework.However,in practice,labeling every data item in data streams is time and resources consuming.A more realistic situation is that only a few instances in data streams are labeled.Therefore,how to design a reliable semi-supervised learning algorithm is a challenging task.The third challenge is high data dimensionality that can significantly affect the performance of the learning algorithm.This thesis proposes novel learning methods to address these issues.The main contributions are given as follows.1.Considering the concept evolution problem,this thesis proposes a new data stream classification algorithm for detecting and learning novel classes.The proposed algorithm can simultaneously handle both concept drift and concept evolution problems together.Furthermore,the proposed method's main benefit is that it can handle the data streams with complex class distribution and distinguish concept drift and evolution from noisy instances.Extensive experiments on synthetic and real-world data sets show that our method has good classification and novel class detection performance(average 5% improvement)compared to state-of-the-art algorithms.2.In light of the label scarcity problem on data streams,this thesis proposes a new reliable online semi-supervised learning algorithm for evolving data stream classification.The proposed algorithm uses the concept of micro-clustering for datastream classification and semi-supervised learning.Furthermore,an ensemble of k-NN classifiers are employed to provide robust classification.The proposed algorithm works in an online way and adequately handles the incoming streaming data,and can be implemented in devices with low computational resources.Experimental results show that the proposed algorithm supports high classification performance(average 8% improvement compared to others)even with a small amount of labeled data.3.Considering the curse of dimensionality and label scarcity problem,this thesis presents a new semi-supervised learning method for streaming data.In the proposed algorithm,a denoising autoencoder is employed to cure the curse of dimensionality by transforming the high dimensional feature space into a reduced,compact,and more informative feature representation.Furthermore,a cluster-and-label technique is used to reduce the dependency on true class labels.The proposed method employs a synchronization-based dynamic clustering technique to summarize the streaming data into a set of dynamic micro-clusters that are further for classification.In addition,a disagreement-based learning method is employed to cope with concept drift.Experimental results demonstrate that the proposed algorithm achieved better performance(average 6% improvement)compared to many state-of-the-art algorithms.

Keywords/Search Tags:

Data stream, Concept drift, New class detection, Semi-supervised classification

PDF Full Text Request

Related items

1	Learning On Evolving Data Streams
2	Research On Semi-supervised Data Stream Classification Method Based On Ensemble Model
3	Research On Multi-label Data Stream Semi-supervised Integrated Classification Method Based On Cooperative Training
4	Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream
5	Research On Classification Of Data Stream With Recurring Concept Drift
6	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering
7	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
8	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
9	Research On Semi-supervised Classification Of Data Stream Based On Clustering
10	Research On Data Stream Classification Method Based On Concept Drift Detection