Font Size: a A A

Research On Selective Ensemble Classification Method For Concept Drift Data Stream

Posted on:2024-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2568307115957459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In real life,data in many practical scenarios(such as weather forecasts,stock market trends,etc.)are generated in the form of stream.The occurrence of concept drift may lead to a decline in the performance of data stream classification models,and ensemble classification methods can effectively address the challenges posed by concept drift.The accuracy and diversity of base classifiers are crucial to the generalization performance of ensemble classifiers.Selective ensemble classification methods can further improve the performance of classification models by using evaluation methods to select base classifiers with excellent performance.Therefore,an important issue is how to choose a base classifier.In addition,when concept drift occurs,the ensemble classifier can be adjusted by using active learning methods to request samples of real labels.However,existing methods often ignore the impact of concept drift on label request strategies,which may increase the probability of labels requests for samples with small value information and cannot adapt to changes in the data stream in a timely manner.In addition,detecting concept drift and identifying its types are the basis for adapting to changes in data stream,but current research on concept drift type identification is limited.Therefore,this paper proposes a selective ensemble classification method for concept drift data stream,which effectively solves the above problems and improves the overall performance of data stream classification.The research content of this article is as follows:(1)A selective ensemble classification method for data stream is proposed.Firstly,this method guarantees the diversity of base classifiers by randomly sampling data to make the training data of each Hoeffding tree different;Next,when selecting a base classifier,the similarity measure and average error of the base classifier are defined to obtain the generalization error of the base classifier.Based on the generalization error,a base classifier with better generalization performance is selected for training the ensemble classifier,which improves the generalization ability of the classifier;Then,in active learning,consider the impact of concept drift on label request strategies,and consider label costs and multi-class imbalances to select the most uncertain samples for label requests,thereby saving label costs;Finally,updating the ensemble classifier when a concept drift warning occurs,or retraining the entire ensemble classifier when a concept drift occurs,is conducive to timely adapting to the new data distribution,thereby improving classification accuracy.(2)A concept drift type identification method is proposed.This method defines a representative sample sliding window to store representative samples selected through active learning before concept drift;The variance of the concept drift index is proposed to measure the stability of sample distribution in the detection sliding window,so that class of concept drift can be identified;Finally,combining the variation rules of the sample concept drift index before and after drift,the subclass of concept drift can be further identified.(3)A selective ensemble classification system for concept drift data stream was developed.Based on the method proposed in this paper,the system implements functions such as data stream classification and drift type recognition.
Keywords/Search Tags:Data stream, Selective ensemble classification, Active learning, Concept drift type identification
PDF Full Text Request
Related items