Statistical Learning Algorithms: Multi-class Classification And Regression With Non-i.i.d. Sampling

Posted on:2010-04-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z W Pan

Full Text:PDF

GTID:1118360275455460

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Learning theory is an inter-disciplinary research field involving applied mathematics, statistics,computer science,computational biology and data mining.It aims at learning function features(such as function value and variables) or data structures from samples by learning algorithms.Main research topics include designing efficient algorithms for various purposes and theoretical analysis of learning methods.In this thesis,we consider two research problems.The first is to propose a new learning algorithm for multi-class classification by Parzen windows and to conduct both theoretical understanding and application for this algorithm.This Parzen windows classifier is better than the usual way of designing multi-class classifiers by combining binary classifiers in various ways,which is often complex and has the problems of overlapping.We give the convergence rates of the excess misclassification error,under some regularity conditions on the conditional probability distributions and some decay conditions on the marginal distribution near the boundary of the input space.In the literature of Parzen windows for density estimation and regression,the approximation error is estimated locally at points which are in the interior of the input space X away from the boundary.Our key contribution for the mathematical analysis is to show how the decay of marginal distributions near the boundary yields satisfactory bounds for errors in terms of L~1 or C(X) norms taken globally on the whole input space.The second research problem considered in this thesis is the study of learning algorithms with non-i.i.d,sampling.The algorithms include least square regularized regression and binary classification.In the last few years,there have been significant developments in theoretical understanding of learning algorithms with i.i.d,sampling. But either independence or identical sampling is a rather restrictive assumption in real data analysis,such as Shannon sampling,randomized sampling and weakly dependent sampling.Our setting does not require independence or identity.Under the conditions that the sequence of marginal distributions for sampling converges exponentially fast in the dual of a H(oï½œÂ¨)lder space and the sampling process satisfies a polynomial strong mixing condition,we derive capacity indepedent learning rates.Our convergence rate is consistent to that of the i.i.d,setting when the mixing condition parameter tends to zero. For a binary classification learning "algorithm with non-identical sampling,we also derive satisfactory capacity dependent estimates for the excess misclassification error.

Keywords/Search Tags:

Learning theory, Error decomposition, Reproducing hilbert space, Muti-class classification algorithm, Regression algorithm, Approximation, Non-i.i.d. sampling, Riemannian manifolds

PDF Full Text Request

Related items

1	Regression Learning For Vector-valued Reproducing Kernel Hilbert Space
2	Applications Of Decomposition Algorithms And Quantile Regression
3	On The Error Analysis Of Coefficient Regularized Scheme
4	Study On The Face Recognition By Sparse Representation Algorithm In Reproducing Kernel Hilbert Space
5	Kernel Based Learning Machines
6	Regularized Regression Learning Algorithms With Unbounded Sampling
7	Manifolds-Based Multivariate Time Series Signals Classification Algorithm
8	PAC-Bayes Bound Theory And Experimental Research On SVM Algorithm
9	Functional Support Vector Machine In Reproducing Kernel Hilbert Space And Its Applications
10	Study On Target Recognition In SAR Image Via The Monogenic Signal