Font Size: a A A

Class-Imbalanced Data Stream Classification Method Based On Adaptive Random Forest

Posted on:2022-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ChuFull Text:PDF
GTID:2518306533972349Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Streaming data is a kind of data form that is closest to actual production and life,and the corresponding data stream learning is also a common problem in machine learning.The possible category imbalance and concept drift in streaming data make it a challenging problem.When the sample sizes of labels of different categories are quite different,traditional imbalanced learning is used to overcome the impact of large categories of data on the accuracy of the classification model and improve the recognition accuracy of small categories of samples.However,the existing imbalanced learning methods lack efficient classification models and detection mechanisms when dealing with dynamic stream data classification,which are used to accurately distinguish the concept drift of stream data and classify the data in a timely and accurate manner.Based on this,this paper combines random forest algorithm and online learning mechanism to propose a class of adaptive random forest algorithm to solve the classification problem of imbalanced stream data.In addition,in view of the concept drift characteristics of imbalanced stream data,corresponding concept drift detection and response strategies are designed,and the above methods are used to solve the actual problem of human activity recognition.The main research work of this paper is as follows:(1)For imbalanced streaming data,a data streaming algorithm based on adaptive random forest is proposed.Based on the SMOTE resampling strategy,according to the imbalanced rate of streaming data,a method to solve the imbalanced streaming data is given;using the Hoeffding tree as the base decision tree,the threshold ? during the splitting process of the imbalanced data-oriented decision tree is given.The method of determining and deriving and analyzing its influence on classification accuracy.Experimental data shows that the proposed adaptive random forest classification algorithm has significant advantages for solving the online classification of imbalanced stream data.(2)Aiming at the concept drift characteristics of imbalanced stream data,a concept drift detection and classification method for imbalanced data stream based on adaptive random forest is proposed.First,for the concept drift of imbalanced data,a new definition and classification of concept drift is given.Based on the classification error rate and Out-of-Bag verification error,the concept drift in the imbalanced data stream is detected.Secondly,under the premise of taking into account the controllable integration scale and the adaptive weight adjustment,a method based on adaptive random forest is proposed.Improved concept drift detection and classification methods.The REP(Reduced-Error Pruning)post-pruning algorithm is used to prun the outdated decision trees with high performance to improve the classification efficiency of concept drift.On the one hand,the proposed algorithm reduces the number of hyper-parameters.On the other hand,the strategy of post-REP pruning and parallel generation of new background trees is adopted,which enables the model to have a faster recovery speed and does not consume too much computational cost.The experimental results show that the concept detection and classification algorithm based on adaptive random forest has better drift detection and classification effects.(3)Considering that the parameter selection of the adaptive random forest algorithm is essentially a kind of multi-modal problem,a model parameter optimization strategy based on multi-modal optimization is given.Furthermore,an improved adaptive random forest algorithm is given,and the proposed improved algorithm is applied to human activity recognition.Compared with the current 7common online integrated learning methods,the feasibility and effectiveness of the proposed algorithm for human activity recognition applications are verified,and the performance of the algorithm is significantly better than other recognition methods.The thesis has 24 figures,18 tables,and 173 references.
Keywords/Search Tags:Imbalanced learning, Online learning, Concept drift, Random forest, Multimodal optimization
PDF Full Text Request
Related items