With the rapid development of information technology,various fields have accumulated a large number of data sets that grow with time,leading to an increasing number of large-scale classification problems.The research on large-scale nonlinear classification algorithms is an essential topic in the fields of machine learning,pattern recognition,and data mining.Online random feature learning algorithms are an important class of methods for addressing large-scale nonlinear classification problems.However,as data arrives continuously,these algorithms only update the hyperplane normal vector while keeping the random feature direction fixed,thus fixing the kernel function and making it difficult to ensure that online learning algorithms can continuously adapt to changes in the data.To address the above deficiencies,this thesis systematically investigates large-scale nonlinear classification problems from the perspective of online adaptive kernel learning.The study covers balanced classification problems,imbalanced classification problems,concept drift classification problems,and imbalanced classification problems with concept drift.The main work of this thesis is as follows:1)In the field of support vector machines,online random feature learning algorithms are one of the important methods to solve large-scale nonlinear classification problems.Traditional online random feature learning methods only update the hyperplane normal vector during the learning process,while keeping the random feature direction fixed,which may result in improper kernel function selection and affect classification performance.To address these issues,we propose the Online Adaptive Kernel Learning with Random Features algorithm(RF-OAK).RF-OAK makes the adjustment of the kernel function more flexible by iteratively updating the random feature direction,so that it can better adapt to changes of the data.Experimental results on public data sets show that,in terms of test accuracy,RFOAK outperforms baseline shallow online learning methods and is comparable to offline deep learning algorithms.In terms of learning speed,RF-OAK is faster than offline deep learning algorithms.2)Cost-sensitive online learning algorithms are an important class of methods to address largescale imbalanced classification problems.However,their main limitations are twofold.Firstly,existing methods pay more attention to the accuracy of the minority class and ig-nore the accuracy of the majority class.Secondly,for multi-classification problems,there may be multiple majority classes and multiple minority classes,making it very difficult to distinguish between different classes and apply different levels of misclassification costs to different classes.To address these issues,we propose two Cost-sensitive Online Adaptive Kernel Learning algorithms(COAK).Specifically,by introducing a novel misclassification cost to balance the accuracy between minority and majority classes,we first design a cost-sensitive online adaptive kernel learning algorithm for imbalanced binary classification problems,and then extend the binary classification methods to multi-classification tasks.Experimental results on public data sets show that compared with baseline online imbalanced learning algorithms,the proposed algorithms COAKB and COAKM can significantly improve overall classification performance while maintaining the accuracy of the minority class.3)For concept drift,existing algorithms either cannot obtain high classification accuracy,or require excessive computing time and memory cost to obtain satisfactory classification results.In order to solve these problems,we propose the Concept Drift Adaptation with Continuous Kernel Learning algorithm(ACKL).Specifically,a new mathematical model is first proposed to tackle the performance degradation caused by concept drift.Then,based on the proposed objective function,the continuous kernel learning approach is applied to handle possible distribution changes as the sample appears one by one.Finally,an ensemble learning method is utilized based on a majority voting strategy to address the parameter sensitivity issue.Experimental results on both simulated and real data demonstrate that ACKL outperforms other benchmark algorithms.4)For imbalanced data streams with concept drift,existing algorithms are mainly based on detector methods,resampling method and ensemble methods.However,these methods either cannot handle the dynamic imbalanced classification problems well,or are too timeconsuming to be suitable for high-speed data streams.In order to solve these problems,we propose the Cost-sensitive Continuous Ensemble Kernel Learning algorithm(CCEKL).Specifically,a novel misclassification cost function is first introduced to address the dynamic imbalanced classification problems.Secondly,based on the modified loss function,the continuous kernel learning method is employed to adapt to continuously incoming data.Finally,an ensemble method is used to address the sensitivity of the proposed algorithm to the initial kernel width.To further improve efficiency,we employ the parallel computing method to simultaneously train multiple classifiers with different initial kernel widths.Experimental results on both simulated and real data show that CCEKL can achieve better performance than baseline algorithms with less training time. |