Font Size: a A A

Research On Concept Drift Convergence Method Based On Adaptive Ensemble Learning

Posted on:2024-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:N SunFull Text:PDF
GTID:2568307115463884Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,a large amount of dynamically updated data,i.e.,streaming data,has been generated in many practical fields in an explosive manner.Different from traditional static data,streaming data has the complex characteristics such as time-series,dynamic,infinity and real-time changes in data distribution,which brings several challenges to traditional static data mining techniques and algorithms.Concept drift is an important and worthy of in-depth study in data mining,which is typically characterized by real-time data distribution changing continuously with time,and has received more and more attention and research.Currently,using the flexibility of ensemble learning to build models to deal with concept drift is a proven means.However,most of the methods can hardly solve the degradation of model performance caused by noise and cannot effectively extract key information from streaming data,resulting in learning models cannot adapt to changes in data distribution in a timely manner and causing poor real-time generalization performance of learning models.To solve these problems,this paper proposes a concept drift convergence method based on adaptive ensemble learning,which solves the problems of overall performance degradation,weak anti-noise capability and slow convergence of the learning model after concept drift.The specific research contents are as follows:(1)To address the problem that the performance of online learning models degrades and does not converge quickly due to insufficient new distribution samples after concept drift occurs in streaming data,an adaptive hybrid ensemble method based on accelerate adaptation of concept drift is proposed.This method extracts the local information from the streaming data through the weighted base classifiers located in the classifier pool.And the local information is supplemented into the current data block through expanding the data to make up for the lack of current distribution data after concept drift occurs,and to build an efficient local base learner that conforms to the current data distribution.On this basis,the key data information at different stages is extracted by local base learner,and the current data is adaptively selected by the data distribution to construct diverse global base learner.Through the hybrid ensemble of local high-performance base learner and the global diverse base learner,the online ensemble model can learn adaptively to the changing streaming data and improve the adaptability after concept drift occurs.(2)To address the problem that the noise interference in the streaming data leads to the weak noise resistance of the ensemble model and the model cannot converge quickly after the concept drift occurs,a concept drift convergence method based on dynamic boundary shrink is proposed.On the one hand,by adjusting the extension distance and intension distance,the decision boundary is dynamically shrunk to eliminate the noise data from the newly arrived data,effectively reducing the negative effect of the noise at the boundary on the model convergence to improve the generalization performance of the model.On the other hand,with the help of the idea of incremental learning,this paper updates the incremental learner by conducting incremental training on new samples in the streaming data,and uses the weight elimination mechanism to timely delete the dynamic classifier that cannot adapt to the current data distribution.In addition,through the ensemble of dynamic classifiers and incremental learners,the model has better adaptive capability in either smooth or non-stationary environments.The research work in this paper introduces ensemble learning by building an efficient ensemble model with real-time dynamic updates,reinforcing the key information in streaming data,reducing the impact of data boundary noise on the model,which improves the adaptability of the model to changing streaming data after drift.This paper accelerates the convergence speed of the model while improving the real-time generalization performance of the learning model after drift,which provides a new idea for ensemble learning methods in streaming data-related tasks that include concept drift.
Keywords/Search Tags:streaming data, concept drift, ensemble learning, hybrid ensemble, boundary shrink
PDF Full Text Request
Related items