Class incremental learning has gradually become a research hotspot in machine learning in recent years.Its technical feature is that the number of class increases as the scale of data expands during the process of incremental learning.The class incremental learning technology on high-speed data stream is urgently needed in real-world fields such as data stream anomaly detection,network intrusion detection,system anomaly diagnosis,and financial market behavior monitoring.However,data stream has the characteristics of dynamic changes,fast velocity,huge scale and high dimension.It presents newer and greater challenges to class incremental learning technology on data stream.Firstly,during the process of class incremental learning,the data scale of the new class is small,which leads to the imbalance distribution between the new class and the existing classes.Class imbalance learning seriously affects the classification accuracy of class incremental learning on data stream.Secondly,data stream usually has the characteristics of high-speed so that class incremental learning requires real-time processing.Accessing to too much historical data can seriously affect the real-time processing of class incremental learning.Thirdly,real-world application usually exhibits large-scale and high-dimensional characteristics.Combined with the characteristics of data stream,it is difficult to deal with high-dimensional data in real time.To deal with the challenges above,this paper proposes Central-diffused Instance Generation Algorithm embedded Boosting,OCSVM based Nested Hierarchy Algorithm and Centraldiffused Class Incremental Learning based on C-SVM over Data Stream.The main research contributions are as follows.Solving class imbalance learning is the basis of class incremental learning on data stream.Most of the existing class imbalance learning methods focus on the classification task of static datasets.It is easy to cause overfitting of invalid information and lead to insufficient generalization by using the distance between data to expand the potential data.To solve this problem,we propose Central-diffused Instance Generation Algorithm embedded Boosting(Cd IGAB).Cd IGAB diffuses the random direction vector from the center of the new class to expand the distribution of the Minority Class and effectively reduce the class imbalance rate.On this basis,we combine the method with Ada Boosting.M2 to assign unequal weights to data which was misclassified in earlier iteration in order to reduce the variance and deviation in the final integration result.It provides a more generalized decision area for Minority Class,significantly increases the diversity between classifiers in the integration,and effectively guarantees the accuracy of class incremental learning.The experimental results show that Cd IGAB can fit the distribution of the new class better,and its corresponding classification accuracy is increased by 10.34% on average,while Macro-F1 is also increased by 13.13% on average.Reducing the dependence on historical data and ensuring the efficiency is the key to class incremental learning on data stream.The traditional incremental learning methods rely on historical data,and often need to access historical data multiple times.To solve this problem,we propose OCSVM based Nested Hierarchy Algorithm(ONHA).ONHA uses the OCSVM to generalize better,and reuses the support vector to keep the key samples while eliminating redundant samples.Finally,the classification model has good data-fitting ability.With the continuous arrival of new class,ONHA builds a hierarchical nested classification model.The experimental results show that the data storage can be reduced by more than 70%,and the training time can be reduced by about 40%.In order to further improve the efficiency of class incremental learning,we adjust and combine the instance generation algorithm Cd IGAB and the nested hierarchy algorithm ONHA.Then we propose Central-diffused Class Incremental Learning based on C-SVM over Data Stream(Cd CIL).Cd CIL reduces the negative effects of highdimensional data by adopting random dimension diffusion.At the same time,it uses a cost-sensitive support vector machine with dynamic adaptability,sets different costsensitive factors for each class on data stream.The update of the nested hierarchy model prevents over-fitting caused by high-dimensional data stream.Finally,it guarantees the accuracy and efficiency of the class incremental learning method for the high-dimensional data stream.The experimental results show that,Cd CIL can effectively adapt to highdimensional data streams in the field of network flow anomaly detection and perform real-time data processing to ensure the accuracy and efficiency on high-dimensional data streams.The time cost of class incremental learning training was reduced by 33.2% on average,and the classification performance was reduced by 6% on average. |