| With the arrival of the era of big data,the application scope of streaming data is expanding.In most cases,streaming data is collected in a non-stationary environment,which has complex characteristics such as infinity,timing,dynamic evolution and nonreproducibility,which brings challenge to the modeling and implementation of streaming data mining.Due to the timing of streaming data,it may cause the streaming data to change in an unforeseeable way over time.For example,the data distribution in streaming data changes over time,that is,concept drift occurs.The object category in the streaming data changes over time,that is,concept evolution occurs.Aiming at the problem of concept drift and evolution existing in streaming data in non-stationary environment,this paper proposes an accelerated adaptation method of concept drift and concept evolution based on decision tree to improve the real-time response speed and online learning efficiency of online learning model.Specific research contents include:(1)Aiming at the problem of concept evolution caused by category changes,an accelerated adaptation method of concept evolution based on online incremental tree is proposed.Firstly,the decision tree was initialized by loading historical data.According to the principle of high cohesion of the same class,the feature space was divided into normal space and abnormal space to distinguish easily identifiable class samples from abnormal samples.On this basis,the feature coefficient is calculated to measure the confusion degree of the sample category corresponding to the leaf node to detect the concept evolution.Finally,feature entropy was used to re-select the splitting attribute and splitting value at each leaf node to realize the online growth and dynamic pruning of decision trees,so as to improve the convergence rate of the model after the occurrence of concept evolution.This model uses the historical key data and the latest data after evolution to extract the information near the concept evolution site,realize the dynamic growth of the decision tree,accelerate the convergence of the online learning model after the concept evolution,and improve the generalization performance of the online learning model.(2)Aiming at the problem of concept drift caused by the change of data distribution,an adaptive concept drift method based on online weighted decision forest is proposed.The method combines weighted decision forest with online incremental tree to accelerate the model’s adaptation to the new data distribution after concept drift.On the one hand,a weighted decision forest is constructed to learn the latest data distribution information,and the weight of the decision tree is updated accordingly to adapt to the changes of data distribution in the streaming data.On the other hand,the effective information of the overall distribution of streaming data is extracted by combining the historical data distribution with the latest samples,and the incremental learning is carried out by combining the online incremental tree to improve the robustness of the model.This method can rapidly converge to the new data distribution after concept drift occurs and improve the generalization performance of online learning model.In this paper,the concepts of feature entropy and feature coefficient are proposed,which can effectively measure the degree of class disorder of leaf node corresponding samples,carry out efficient concept evolution detection,and realize adaptive convergence of decision tree model to concept evolution data.In addition,the combination of weighted decision forest and online incremental tree can effectively improve the convergence rate of the model to the newly distributed data after concept drift,and provide a feasible solution for streaming data mining. |