Font Size: a A A

Regularized Regression Learning Algorithms With Unbounded Sampling

Posted on:2014-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:X R ChuFull Text:PDF
GTID:2268330425981106Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Regularized regression problem is one of the main research fields of the statisticallearning theory. Theoretical study of regression learning algorithms is mostly based on theuniform bounded assumption of the output data. However, this standard bounded conditionfailed in many cases, such as Gaussian distribution。The learning algorithms in a setting ofunbounded sampling start to attract more and more attentions in recent years. In2010, C.Wang and D. X. Zhou imposed an unbounded condition with the moments of the output dataand studied the regularized least square regression learning algorithm by the covering numbermethod. In this thesis we assume that the p-th moment of the output data is bounded forsome p≥2,p∈N. Our unbounded condition is a natural generalization of the momenthypothesis, and an example has been given where the unbounded hypothesis satisfied butmoment hypothesis failed. The following analysis is based on the mild assumption ofunbounded outputs above. By means of the integral operator and error decompositiontechniques, we study the asymptotic convergence of the least square regularization algorithmsand the coefficient regularization algorithms withl~2regularizer.For the coefficient regularization, we first discuss the coefficient regularized regressionalgorithm with the indefinite kernel. By the definition of the sample operator the explicitexpression of f_zis deduced and next the generalization error can be divided into three partswith the help of the regularization function. And finally we can deduce the learning rateswhich are similar to the rates in the bounded sampling setting. Moreover,we study the halfsupervised coefficient regularization for regression learning with unbounded sampling. Thehalf supervised approach is that the hypothesis space and learning algorithms are based ontwo different groups of input data. In practice, a large amount of data is available, but only afew of them can be labeled easily and others relative large amount of data can not be labeled.The half supervised approach makes full use of the information included by the unlabeled andlabeled data.The asymptotic convergence of the least square regularized regression learning algorithm is considered with unbounded sampling. The sampling process satisfies α-mixing orφ-mixing conditions. Using the probability inequality about the strongly mixing sampling,the asymptotic convergence of algorithms has been studied and capacity independent errorbounds and learning rates are deduced. Some interesting phenomena are revealed.(1)Smoother regression function f_ρimplies better learning rates.(2) Stronger dependencebetween samples implies that they contain less information and hence lead to worse rates.(3)The learning rates are improved as the dependence between samples becomes weaker and f_ρbecomes smoother but they are no longer improved to a certain extent, which is called thesaturation effect.(4) In φ mixing process, the learning rates have nothing to do with theunbound condition parameter p,while for α-mixing process, t and r stand for thedependence between samples and the smoothness of the regression function respectively,whent> p/(p-2)andr≥1/2, the influence of the unbounded condition becomes weak. Moreover,when the parameter t is large enough, our learning rate with unbounded sampling is as sharpas that with uniform bound sampling.This thesis also studies the regularized least square regression algorithm with iidsampling,and gives a brief algorithm analysis of the leave-one-out method which wasproposed by T. Zhang.
Keywords/Search Tags:learning theory, regression learning, sample error, approximation error, learning rate
PDF Full Text Request
Related items