Font Size: a A A

Online Learning Algorithms For Classification Of Streaming Data

Posted on:2019-11-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:T T ZhaiFull Text:PDF
GTID:1368330572468837Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification of streaming data is one of the most important tasks in streaming data mining,which aims to discover the latest relationship between the input variables and the label variable from an endless stream so as to predict accurately the labels of the instances that may arrive anytime.The online learning paradigm,as an incremen-tal machine learning technique,is an effective tool for streaming data classification.In recent years,with the rise of large-scale applications,the online learning paradigm has begun to receive widespread attention and application.This paper focuses on on-line learning algorithms for classification of high-dimensional and evolving streaming data.Specifically,for handling the challenge of "curse of dimensionality" posed by high-dimensional streaming data,and also the challenge of "concept drifting" posed by evolving streaming data,this paper studies the online feature selection problem,the sparse online learning problem,and the resource-efficient online ensemble learning problem,respectively.The main work and innovations of this paper include:1.Two novel online feature selection algorithms based on adaptive sub-gradient meth-ods are proposed,which achieve sparsity by using to-based truncation techniques.The relevance of features is assessed by taking into account the weight of features in the current predictor and the frequency of features in the history of predictions.By taking as input a desired budget,both algorithms are easy to control,especially in comparison with 1-based feature selection techniques.A detailed regret analy-sis for both algorithms is provided.Experiments on six high-dimensional datasets show the superiority of the proposed algorithms compared with state-of-the-art on-line feature selection algorithms and sparse online learning algorithms based on ?l regularization.2.The shifting regret bounds of three classic gradient descent algorithms in the dy-namic environment are analyzed.The analyses provide a new insight that the step-size choice takes a crucial part in the adaptability to concept drifts.Specifically,for gradient descent techniques,a better adaptability to concept drifts can be achieved using constant step-sizes rather than standard decreasing step-sizes.Empirical evi-dence for the influence of the step-size choice is also provided.Based on the above theoretical and experimental results,a sparse approximated linear classification al-gorithm is proposed,which uses constant step-size so that better adaptability to concept drifts is achieved.At each round of learning,the algorithm first performs a gradient descent to get an intermediate solution,then finds a new sparse solution close to the intermediate solution,which involves solving a non-con vex optimiza-tion problem.It is proved that the non-convex optimization problem can be solved efficiently by a simple greedy truncation method.A truncation error parameter can control how close the intermediate solution to the new solution,therefore it controls the sparsity of the model changing from no sparsity to complete sparsity continu-ously.The shifting regret bound of the proposed algorithm is given and analyzed,and a large number of experiments have demonstrated the superiority of the pro-posed algorithm over both stationary and evolving streaming data compared with state-of-the-art sparse online learning algorithms.3.A resource-efficient online ensemble algorithm for classification is proposed.The algorithm uses BPegasos,an online kernelized SVM-based algorithm,as the com-ponent classifier,for handling the scalability problem and the small sample prob-lem in each concept posed by high-dimensional streams,and it takes full advan-tage of the characteristics of BPegasos to cope with various types of concept drifts.Specifically,the algorithm constructs diverse component classifiers by controlling the budget size of BPegasos;it also equips each component classifier with a drift detector for monitoring and evaluating its most recent performance;when the drift detectors detect that some components have degraded severely,a drift alarm is trig-gered,then the worst performing component and its corresponding drift detector are reset so as to learn from scratch on the most recent data.Lastly,we prove ex-perimentally that our proposed ensemble algorithm is more effective in prequential accuracy and more resource-efficient than state-of-the-art Hoeffding tree ensembles on high-dimensional streaming data;Comprehensive experiments on synthetic and real-life datasets also show that EBPegasos can cope with various types of concept drifts significantly better than advanced ensemble algorithms when all ensembles use BPegasos as the component classifier.
Keywords/Search Tags:Online Learning, Streaming Data Classification, Curse of Dimensionality, High-dimensional Streaming Data, Feature Selection, Sparse Online Learning, Con-cept Drift, Evolving Streaming Data, Online Ensemble
PDF Full Text Request
Related items