Research On The Evaluation And Estimation Of The Complexity Of Statistical Learning Models

Posted on:2007-08-20

Degree:Master

Type:Thesis

Country:China

Candidate:Z Yang

Full Text:PDF

GTID:2178360212485430

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Statistical learning is a useful technology for various problems in basic sciences, engineering, and business applications. It tries to extract the dependency rule from a finite number of examples, with the ultimate aim of predicting the future unknown samples as accurate as possible. In order to seek models with good generalization abilities, effective criteria to evaluate the complexity of the model and guide model selection are indeed crucial topics in statistical learning research. It has long been recognized that the Structural Risk Minimization (SRM) principle based on the concept of VC-dimension provides an excellent means for complexity selection of a learning machine. Unfortunately, deriving analytic expression of the VC-dimension using its definition will be extremely difficult, except for a handful number of learning machines. As pointed out by V.N. Vapnik, the only practicable approach is to estimate this quantity from empirical experimentations. In this thesis, we propose a new method to estimate the VC-dimension through experimental procedures, which is not based on the supremum of risk discrepancies as used in the previous method. Experimental results on learning machines whose VC-dimensions are theoretically known show that the estimated results of the new method agree well with the theoretical true values. More importantly, it is able to be applied to more complicated learning machines where the previous method cannot be employed. In this study, we will use Classification And Regression Trees (CART) as an example to demonstrate, how the new method can estimate and evaluate the model complexity of CART with respect to the number of splits, and can assist the model selection (pruning) procedure of CART. Learning experiments on benchmark datasets show that, in eitherclassification or regression tasks, this novel strategy of tree model selection performs better than alternative methods, and approaches very close to the ideal performance that any tree pruning criterion can attain.

Keywords/Search Tags:

statistical learning, complexity criterion, model selection, VC-dimension, classification and regression trees (CART)

PDF Full Text Request

Related items

1	A statistical model to estimate cart push forces with known cart load weights
2	The Research For "Case Mix" Classification Based On Rough Set And Classification And Regression Trees
3	Support vector machine/regression feature selection with an application towards classification
4	Research On Ensemble Regression Learning Based On Classifier Selection And Multiple Kernel Selection Under Least Squares Framework
5	Statistical Learning Algorithms: Multi-class Classification And Regression With Non-i.i.d. Sampling
6	Bayesian classification using Bayesian additive and regression trees
7	Towards tractable parameter-free statistical learning
8	Bolstering CART and Bayesian variable selection methods for classification
9	Research On Feature Selection And Semi-Supervised Classification
10	Research On The Impact Of The Improved Sample Selection On Classification Algorithm