Font Size: a A A

Concept Drift Adaptation Base On Ensemble Learning

Posted on:2022-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:C L LiaoFull Text:PDF
GTID:2518306536963609Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Concept drift is a phenomenon which the probability distribution of data changes over time.It is widely existed in data-driven application.Under the influence of concept drift,models built based on historical data are likely to experience performance degradation or even failure.Aiming at the problems of insufficient use of historical knowledge leading to poor model accuracy and insufficient processing ablility for dealing with class imbalance data,this paper studies two concept drift adaptation algorithms based on ensemble learning.The main contributions are as follows:(1)Aiming at the problem of poor model accuracy caused by insufficient use of historical knowledge,Knowledge Transfer Ensemble(KTE)is proposed.Different from the existing methods based on accuracy ensemble,KTE includes three parts: knowledge preservation,historical model adaptation,and ensemble strategy.The relation between data probability distribution and concept adaptation is exploited in the knowledge preservation and ensemble strategy.In the knowledge preservation part,historical knowledge is preserved by optimizing the maximum mean discrepency(MMD),which ensure the diversity of knowledge in the sub-classifiers,and improve the generalization ability of the ensemble model.For each preserved classifier,an adaptive tree algorithm is used to transfer knowledge from the current data chunk to improve the adaptability of the historical classifier.The classification accuracy of the sub-classifier and distribution discrepency factor are used to assign weights to each sub-model by a nonlinear weighting function.This ensures high in the whole learning process.Experiments on artificial synthetic data such as SIN and real world datasets such as Covertype show that KTE has obvious advantages compared with the other five comparison algorithms.(2)Aiming at the problem of poor generalization ability of existing algorithms when faced with a variety of complex concept drifts,Principle Component Random Forest(PCRF)is proposed.PCRF introduces principal component analysis and feature stratification into the subtree construction process,improves the quality of discriminative information contained in subtrees,and enhances the accuracy and generalization ability of PCRF.The samples obtained in the concept drift warning stage is used to construct background trees.The worst-performing subtrees are replaced with the background trees in the drift stage,which improves the adaptive ability of PCRF.Experiments on four different types of artificial concept drift data and four real concept drift data show that PCRF performs well in the face of various types of concept drift and their unknown combinations.(3)A novel concept drift adaptation algorithm based on Multi-Kernel Cost Sensitive Ensemble(MK-CSE)is proposed to improve insufficient processing ability in class imbalance environment.MK-CSE handles class imbalance data through two aspects: knowledge preservation based on multi-kernel MMD and multiple cost sensitivity.MK-CSE obtains the linear optimal kernel combination of the mapping process to ensure the robustness of the measurement and preseve the diversity of historical knowledge.Through the cost-sensitive fusion of the feature level and the ensemble level,a multiple cost-sensitive method is studied: at the feature level,a dynamic sensitive feature space is constructed,the current task-oriented high-discrimination features are extracted,and the characteristics of a small number of class samples are fully captured;at the ensemble level,the non-linear fusion of misclassification cost,classification accuracy and distribution discrepency factor is combined for weight assignment,which improves the algorithm's robustness in class imbalance environment.The performance on artificial synthetic datasets such as imbalance SEA and real world datasets such as Weather show that MK-CSE has not only has a more stable concept drift adaptive ability,but also achieves best classification results with imbalance concept drift data.
Keywords/Search Tags:Adaptive algorithm, Concept drift, Class imbalance, Ensemble learning
PDF Full Text Request
Related items