Font Size: a A A

Incremental Learning Algorithms With Concept Drift Adaptation

Posted on:2018-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:1318330512485617Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the coming of the big data era,the processing and learning tasks for large-scale data have attracted much attention from the research field,which also facilitates the high-quality industrial application and daily service.Incremental learning processes large scale data by updating learning machines(models)when new training data arrives,which has been widely studied in recent years.However,the phenomenon of concept drift,i.e.,changes in joint distribution of data,always deteriorates the performance of incremental learning,and poses a great challenge in the application of incremental learn-ing in the real world.To handle concept drift in incremental learning,the thesis brings forward two incremental learning algorithms with concept drift adaptation,and designs a parallel learning implementation method.The main contributions are stated below.First of all,to exploit historical knowledge in incremental learning to facilitate the adaptation of concept drift,a novel ensemble learning method,namely Diversity and Transfer based Ensemble Learning(DTEL),is proposed.It is assumed that the histor-ical knowledge is related to the one in current learning step in incremental learning.Hence,the transfer learning operation can be applied in concept drift adaptation.On one hand,it will exploit the useful knowledge in model trained from historical data(i.e.,historical model);on the other hand,it also can avoid the negative impact of the incon-sistent information therein.Moreover,due to the limitation of the memory size,only a fixed number of historical models can be preserved in learning system.A diversity-based model selection criterion is employed to preserve the previously trained models,to provide as much as possible knowledge for transfer operation and the concept drift adaptation task.To verify the effectiveness of DTEL,multiple sets of synthetic data and real-world data are tested in experiment.The synthetic data involves five different types of concept drift,and the real-world data covers data from four different real appli-cations.Empirical results have shown that DTEL can handle concept drift effectively and has a satisfactory performance on different types of concept drift.Secondly,to handle class evolution,a class-based ensemble learning algorithm is proposed.Class evolution,which is special type of concept drift,refers to the class emergence and class disappearance.The existing works for class evolution implicitly assume the classes emerge or disappear in a transient manner,which is not true for many real-world problem.This work investigates the class evolution problem with gradually evolved classes.To deal with class evolution,the algorithm maintains a base learner for every class.Specifically,initialize a new model for class emergence and inactivate the corresponding model for a disappeared class.The gradual evolution of classes will cause the dynamic class-imbalance problem.In order to handling this problem,a novel under-sampling method is designed and embedded in each base model.Class evolu-tion has three basic elements,i.e.,the emergence of novel classes,the disappearance of outdated classes,and the reoccurrence of disappeared classes.In this experiment,syn-thetic data and real-world data are used to represent different types of class evolution,to comprehensively verify the performance of CBCE.Two real-world data are processed to simulate the phenomenon of class evolution,and the data from social network appli-cation is used as the real-world data.Empirical studies verify the reliability of CBCE on handling class evolution and show that CBCE also could deal with the dynamic class imbalance problem caused by the gradual class evolution.Finally,to apply the incremental learning algorithm in real-world applications,an parallel learning implementation method for concept drift adaptation in incremen-tal learning is designed in this work.In real-world applications for learning big data,the algorithms are not only needed to have a high prediction accuracy,but also have to meet the requirement of time efficiency,in case of the rapid generation of data.The parallelizable algorithm is the precondition for building a parallel learning system.In incremental learning,the ensemble learning algorithms are of natural parallelism.In order to improve the time efficiency of learning algorithm,this work analyzes the en-semble models in incremental learning and generalizes a parallel learning implemen-tation method,to show how to implement algorithm by this implementation method.In addition,the two ensemble learning algorithms proposed in this thesis,i.e.,DTEL and CBCE,are implemented by the parallel implementation method and tested in the experiment.The experiment results have shown that the parallelized DTEL and CBCE algorithms have a high speed-up ratio comparing to the original ones,and verified the effectiveness of the parallel implementation method in this work.
Keywords/Search Tags:Incremental Learning, Concept Drift, Ensemble Learning, Online Learn-ing, Data Stream Mining, Supervised Learning
PDF Full Text Request
Related items