Font Size: a A A

Statistical Learning Algorithms: Multi-class Classification And Regression With Non-i.i.d. Sampling

Posted on:2010-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W PanFull Text:PDF
GTID:1118360275455460Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Learning theory is an inter-disciplinary research field involving applied mathematics, statistics,computer science,computational biology and data mining.It aims at learning function features(such as function value and variables) or data structures from samples by learning algorithms.Main research topics include designing efficient algorithms for various purposes and theoretical analysis of learning methods.In this thesis,we consider two research problems.The first is to propose a new learning algorithm for multi-class classification by Parzen windows and to conduct both theoretical understanding and application for this algorithm.This Parzen windows classifier is better than the usual way of designing multi-class classifiers by combining binary classifiers in various ways,which is often complex and has the problems of overlapping.We give the convergence rates of the excess misclassification error,under some regularity conditions on the conditional probability distributions and some decay conditions on the marginal distribution near the boundary of the input space.In the literature of Parzen windows for density estimation and regression,the approximation error is estimated locally at points which are in the interior of the input space X away from the boundary.Our key contribution for the mathematical analysis is to show how the decay of marginal distributions near the boundary yields satisfactory bounds for errors in terms of L~1 or C(X) norms taken globally on the whole input space.The second research problem considered in this thesis is the study of learning algorithms with non-i.i.d,sampling.The algorithms include least square regularized regression and binary classification.In the last few years,there have been significant developments in theoretical understanding of learning algorithms with i.i.d,sampling. But either independence or identical sampling is a rather restrictive assumption in real data analysis,such as Shannon sampling,randomized sampling and weakly dependent sampling.Our setting does not require independence or identity.Under the conditions that the sequence of marginal distributions for sampling converges exponentially fast in the dual of a H(o|¨)lder space and the sampling process satisfies a polynomial strong mixing condition,we derive capacity indepedent learning rates.Our convergence rate is consistent to that of the i.i.d,setting when the mixing condition parameter tends to zero. For a binary classification learning "algorithm with non-identical sampling,we also derive satisfactory capacity dependent estimates for the excess misclassification error.
Keywords/Search Tags:Learning theory, Error decomposition, Reproducing hilbert space, Muti-class classification algorithm, Regression algorithm, Approximation, Non-i.i.d. sampling, Riemannian manifolds
PDF Full Text Request
Related items