Font Size: a A A

Research On Concept Drift Algorithm For Data Stream Learning

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:M M HanFull Text:PDF
GTID:2428330605472940Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Concept drift is a common problem in data stream learning which refers to the phenomenon that the space distribution of samples changes with time,it usually leads to the performance degradation of the model.The cause of concept drift always varies according to different application scenarios.Concept drift is one of the key problems in data steam learning.Algorithms for solving concept drift can be divided into two categories: adaptive algorithms and active detection algorithms,adaptive algorithms are constantly updated with new data to adapt to changes in concepts,while active detection algorithms focus on detecting whether concept has drifted and finding the timestamp when the concept drift occurred.This paper conducts in-depth research on the concept drift algorithm,improves the existing problems and proposes an adaptive ensemble algorithm CIUE and an active detection algorithm ECBDM.Firstly,aiming at the problems that AUE2 algorithm lacks the means to enhance the diversity of weak classifiers in the incremental training process of weak classifiers,this paper proposes an ensemble algorithm CIUE based on Boosting idea to adapt to the concept drift,CIUE draws on AUE2's mechanism of incremental training of weak classifiers with small data blocks as algorithm input,and improves the incremental training process of weak classifiers to enhance the diversity of weak classifiers,and CIUE has higher requirements for weak classifier performance.Compared with the AUE2 algorithm,which maintains a fixed number of weak classifiers,CIUE maintains a unfixed number of weak classifiers.Combined with a weak classifier cache queue,CIUE can better adapt to concept drift and resist noise interference to the classifier.Experiments show that compared with existing ensemble algorithms,CIUE achieves higher average classification accuracy.Then,this paper proposes a concept drift active detection method ECBDM.Most of the existing active detection algorithms focus on the classification performance of models,while ECBDM focuses on how the space distribution of samples changes.Based on the Bayes Decision Theory and the Chernov-Hoeffding Boundary,this algorithm detects concept drift by monitoring the change of joint entropy and centroid of samples in the sliding window.With the sliding of the window,old samples are removed from the window and new samples are added to the window.In this process,both joint entropy and centroid can be updated in constant time.This algorithm has constant time complexity and fast processing speed when concept drift does not occur.The experimental results show that the ECBDM with reasonable threshold can detect the concept drift well.
Keywords/Search Tags:data stream, concept drift, ensemble algorithm, active detect, joint entropy
PDF Full Text Request
Related items