Font Size: a A A

Research On Concept Drift Data Stream Classification Based On Ensemble Learning

Posted on:2021-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2428330647452823Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of technology and the progress of society,especially the information technology,various forms data have been produced in different industries.As a new type of data,data stream has some characteristics like: high-dimensional,concept drift,fewer tags,faster speeds,noise and imbalanced class.At the same time,the classification of data stream requires high performance for time and space.The existing classification methods for data set are difficult to cope with these characteristics in data stream.So,the algorithms for data stream classification have gradually become hot topics in the field of data mining.This thesis focuses on the classification research of data stream mining based on ensemble learning.After deepen understanding of the characteristics of data stream,the corresponding technology of data stream classification,it is found that the most important problem in data stream classification is concept drift in data stream.In order to deal with the problem of concept drift,some optimizations have been made on existing data stream classification algorithms.The main innovations are as follows:Firstly,this thesis proposes an ensemble learning model based on the reward mechanism,called the REWARD(RE)ensemble.This method is based on Bagging's ensemble learning and adjusts the weight of the base classifier by drawing the reward mechanism for reinforcement learning.When the ensemble is deployed in stream,the classified data instances in the data stream can be used to incrementally train the base classifiers.At the same time,the weight of each base classifier is updated based on the classification results of the base classifier.When the weight of a group classifier in the ensemble is less than the threshold,the ensemble will prune the poor classifier and create a new base classifier from the buffered data stream instance.This method can effectively deal with the concept drift that may exist in data stream classification,especially the incremental drift in concept drift.Secondly,when using the ensemble learning model based on the reward mechanism,the effect of processing sudden drift is not particularly excellent,so this thesis also proposes a multi-type base classifiers ensemble learning method based on the idea of Bagging.This method divides the n base classifiers in the ensemble into two,in which 1/2 base classifiers are set as stable base classifiers,and the other 1/2 base classifiers are set as dynamic base classifiers.Among them,the stable base classifier is updated in the data stream by incremental learning,and the dynamic base classifier is refactored after a period of time in the data stream.The data used in refactoring is the classification result in the last time break.In order to further improve the problem of sudden drift of the data stream,this paper introduces an abandon algorithm in the stable base classifier,that is,if the accuracy of the base classifier classification is less than a certain threshold,temporarily abandon the base classifier to promote the accuracy,and when the accuracy of classification recovers from incremental learning,the base classifier is enable.In order to verify the reliability and effectiveness of the algorithm model proposed in this paper,the parameters of the experiment are first determined through its own comparative experiments,and then the algorithm proposed in this paper is compared with other similar data streams classification algorithm on the simulated data set and the real data set.Experiments show that in the case of concept drift in the data stream,both methods can achieve higher classification accuracy and each have unique advantages.
Keywords/Search Tags:Data stream classification, Ensemble learning, Concept drift, Reward mechanism, Multi-type base classifiers Equation Section(Next)
PDF Full Text Request
Related items