Font Size: a A A

Study On Two Online Algorithms For Large Scale Streaming Data

Posted on:2020-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:F H GaoFull Text:PDF
GTID:2518306500483624Subject:Statistics
Abstract/Summary:PDF Full Text Request
Regression and classification are two problems which are widely concerned and studied in the field of machine learning and data mining.With the continuous development and progress of information technology,more and more data are produced in the form of data stream,which is called streaming data.Concept drift is an important problem in streaming data mining.At present,online learning is the most commonly used method to deal with streaming data and its concept drift problem.Based on the framework of online learning,it is an effective way to design the online version of the classical batch learning model for streaming data mining.The current online learning algorithms of kernel ridge regression are all incremental online learning algorithms,which bring difficulties to the storage space and computing efficiency of computer.In order to solve the storage and real-time updating problem of the kernel ridge regression model,we propose a budgeted online kernel ridge regression algorithm.We introduce the fixed budget strategy into the algorithm through the minimum contribution criterion to prevent the overflow of storage space,and we use matrix correction technology and the Sherman-Morrison-Woodbury formula to improve the updating efficiency of the model.At present,many online multi-classification algorithms are able to update the model efficiently,but their poor noise tolerance makes it difficult to achieve high classification accuracy when processing noisy streaming data.In order to solve this problem,we design a noise-resilient online multi-classification algorithm through the introduction of adaptive ramp loss,so as to improve the classification accuracy of streaming data in the case of noise interference.Numerical experiments on several benchmark data sets and practical application data sets verify the effectiveness and potential value of the proposed algorithm,which provides a basis for further research and application in the future.
Keywords/Search Tags:Online learning, Concept drift, Kernel ridge regression, Noise-resilient, Mulit-classification
PDF Full Text Request
Related items