Font Size: a A A

Research On Class Incremental Learning And Concept Drift Detection In Multi-label Data Streams Classification

Posted on:2016-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ShiFull Text:PDF
GTID:2308330479497161Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming of the era of big data, the data is exploding and its type is becoming more diverse. The applications of multi-label data streams have become more and more common in the real world, such as emails classification, news feeds, medical diagnosis, image recognition, etc. Since the multi-label data streams involve the properties of high speed, huge volume, concept drift, samples with multiple labels, label dependence, etc., they enable the learning model to deal with the unique properties of data streams and multi-label data. It brings about new challenges for the traditional study over detection of concept drift and classification problems in data streams.In multi-label data streams, the sample can be associated with multiple class labels, and a multi-label data stream can be transformed into one or more single-label data stream by using the problem transformation method. However, this transformation method is unfit for the scenarios of data streams, because the label combinations of samples can change with time in the evolving data streams. In addition, the label dependence between labels also changes with time in multi-label data streams.For the above issues, this thesis has made three aspects of contributions:(1) Through analyzing that the multi-label streams classification algorithm EaHTps tries to take into account label dependence, but they only focus on the existing frequent label combinations, leading to worse performance for multi-label classification. To deal with this problem, it proposes an algorithm based on class incremental learning, which dynamically recognizes some new frequent label combinations and updates the trained classifier by class incremental learning strategy. Experimental results demonstrate its better predictive performance;(2) Through analyzing the correlation between samples and labels in multi-label data, it proposes an algorithm of detecting concept drift based on the probability of relevance for multi-label data streams. The basic idea originates from the reason of concept drift and describes the distribution of data streams by using the probability of relevance between samples and labels. Then, it estimates whether the concept drift occurs or not through monitoring the change of distribution between the old data and new data. The final experimental results show that the proposed algorithm can rapidly and accurately detect the concept drift;(3) Through particularly analyzing the label dependence that commonly exists in multi-label data streams, the method of detecting concept drift based on label grouping and entropy for multi-label data streams has been put forward. The label dependence usually contains the correlation between labels and the interdependent between features and labels set. In order to deal with the unique property, the proposed algorithm adopts the label grouping technique to group the set of labels into different subsets where each contains those labels that are correlated and interdependent. Then it employs the entropy to measure the relationship of distribution between features and multiple label subsets. In addition, a threshold method has been introduced to detect concept drift by estimating whether the distribution of samples is changing or not. In the phase of experiment, the verification experiment is based on the synthetic datasets with different types of concept drift and the experimental results show that it’s beneficial to detect concept drift by considering the dependence between labels. The final comparative experimental results show that the proposed algorithm has achieved better performance than other baseline methods for tackling concept drift problem in the multi-label data streams scenarios.
Keywords/Search Tags:Multi-label data streams, Concept drift, Class incremental learning, Probability of relevance, Label dependence, Entropy
PDF Full Text Request
Related items