Font Size: a A A

Research On Concept Drift Data Stream Classification Algorithm Based On Ensemble Learning

Posted on:2024-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:R H CuiFull Text:PDF
GTID:2568307058459734Subject:Radio Physics
Abstract/Summary:PDF Full Text Request
The rapid development of information technology has promoted the modernization of industry and commerce,and the speed of data generation from data sources is becoming faster and faster.This fast,continuous,potentially infinite,and orderly sequence that changes over time is called "data stream.".There is a concept drift phenomenon in data streams where the data distribution changes over time,which is considered to be one of the main reasons for the decline in the accuracy of machine learning models.In the constantly changing big data environment,how to provide more reliable and effective machine learning models has become a hot topic for researchers.The phenomenon of concept drift makes traditional data analysis and processing techniques unable to achieve good results,which poses a huge challenge to data mining and data stream analysis.Currently,most algorithms related to concept drift mainly aim at detecting single types of concept drift and restrict the input data to obey a certain distribution,making the effect of detecting multiple types of concept drift not ideal.The classification model based on ensemble learning has the advantages of good performance,easy fusion,stronger robustness,and applicability to different sample sizes.It can solve the problem of multiple types of concept drift in streaming data.This paper proposes an online ensemble adaptive algorithm for concept drift data streams using the idea of online ensemble learning,which improves the classification accuracy of the algorithm and makes the algorithm suitable for various types of concept drift.The main work is as follows:(1)An improved algorithm for adaptive random forest(ARF)and stream random patch(SRP)is proposed.Aiming at the problem that both ARF and SRP algorithms use parameter testing methods(assuming that the data follows a potential distribution)and the accuracy of model classification is low in concept drift scenarios,an improved algorithm based on nonparametric testing ARF and SRP is proposed.The algorithm uses a strategy of combining nonparametric KS-test with sliding windows as a conceptual drift detection mechanism.The algorithm constructs two sliding windows,compares the absolute distance of the empirical cumulative distribution of data from the two sliding windows through KS-test,and sets warning and drift thresholds.When the detection distance is greater than the warning threshold,the background tree of each tree is trained with the newly arrived data;When the detection distance is greater than the drift threshold,the previous tree is replaced with a background tree to adapt to the new concept.Non parametric testing methods use sample information to infer the overall distribution,making the algorithm more applicable and more accurate.(2)An online ensemble adaptive algorithm(KSHPR)is proposed.Based on improved ARF and SRP algorithms,four base learners,ARF-HDDM_A,ARF-KSWIN,SRP-HDDM_W and SRP-KSWIN are constructed.By calculating the classification accuracy of base learners in real time,each base learner is dynamically assigned a weight value,and then the weight value is ensemble and weighted average.Using online real-time update of weights provides a powerful guarantee for improving the accuracy of model classification.At the same time,basic learners also increase the randomness and diversity in the model construction process,resulting in a robust and diverse ensemble model that can well adapt to the complex scenarios of multiple concept drift data streams in the real world.Experiments show that the proposed algorithm performs well in both real and synthetic datasets,and its stability,classification accuracy,and adaptability to multi type concept drift are improved compared to other algorithms.
Keywords/Search Tags:data stream, Concept drift, Online learning, ensemble learning
PDF Full Text Request
Related items