Research On Class Incremental Learning And Concept Drift Detection In Multi-label Data Streams Classification

Posted on:2016-11-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Shi

Full Text:PDF

GTID:2308330479497161

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the coming of the era of big data, the data is exploding and its type is becoming more diverse. The applications of multi-label data streams have become more and more common in the real world, such as emails classification, news feeds, medical diagnosis, image recognition, etc. Since the multi-label data streams involve the properties of high speed, huge volume, concept drift, samples with multiple labels, label dependence, etc., they enable the learning model to deal with the unique properties of data streams and multi-label data. It brings about new challenges for the traditional study over detection of concept drift and classification problems in data streams.In multi-label data streams, the sample can be associated with multiple class labels, and a multi-label data stream can be transformed into one or more single-label data stream by using the problem transformation method. However, this transformation method is unfit for the scenarios of data streams, because the label combinations of samples can change with time in the evolving data streams. In addition, the label dependence between labels also changes with time in multi-label data streams.For the above issues, this thesis has made three aspects of contributions:(1) Through analyzing that the multi-label streams classification algorithm EaHTps tries to take into account label dependence, but they only focus on the existing frequent label combinations, leading to worse performance for multi-label classification. To deal with this problem, it proposes an algorithm based on class incremental learning, which dynamically recognizes some new frequent label combinations and updates the trained classifier by class incremental learning strategy. Experimental results demonstrate its better predictive performance;(2) Through analyzing the correlation between samples and labels in multi-label data, it proposes an algorithm of detecting concept drift based on the probability of relevance for multi-label data streams. The basic idea originates from the reason of concept drift and describes the distribution of data streams by using the probability of relevance between samples and labels. Then, it estimates whether the concept drift occurs or not through monitoring the change of distribution between the old data and new data. The final experimental results show that the proposed algorithm can rapidly and accurately detect the concept drift;(3) Through particularly analyzing the label dependence that commonly exists in multi-label data streams, the method of detecting concept drift based on label grouping and entropy for multi-label data streams has been put forward. The label dependence usually contains the correlation between labels and the interdependent between features and labels set. In order to deal with the unique property, the proposed algorithm adopts the label grouping technique to group the set of labels into different subsets where each contains those labels that are correlated and interdependent. Then it employs the entropy to measure the relationship of distribution between features and multiple label subsets. In addition, a threshold method has been introduced to detect concept drift by estimating whether the distribution of samples is changing or not. In the phase of experiment, the verification experiment is based on the synthetic datasets with different types of concept drift and the experimental results show that it’s beneficial to detect concept drift by considering the dependence between labels. The final comparative experimental results show that the proposed algorithm has achieved better performance than other baseline methods for tackling concept drift problem in the multi-label data streams scenarios.

Keywords/Search Tags:

Multi-label data streams, Concept drift, Class incremental learning, Probability of relevance, Label dependence, Entropy

PDF Full Text Request

Related items

1	Research On Classification Of Multi-Label Data Streams
2	Research On Classification For Data Streams With Concept Drift
3	Concept Drift Detection Algorithm Based On Multi-label Learning With Label Special Features
4	Research On Multi-label Data Stream Classification Method Based On Kernel Extreme Learning Machine
5	Research On Multi-label Data Stream Semi-supervised Integrated Classification Method Based On Cooperative Training
6	Contributions To Several Issues Of Multi-Label Learning
7	The Research And Implementation Of User Attribute Streaming Prediction Based On Multi-label Learning
8	Class-imbalance Issue In Applying Multi-label Learning To The Study Of Parkinson In Traditional Chinese Medicine Diagnosis
9	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations
10	A Research On Emerging New Label And Incremental Learning In Mulit-label Data Stream