Font Size: a A A

Research On The Evaluation And Estimation Of The Complexity Of Statistical Learning Models

Posted on:2007-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2178360212485430Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Statistical learning is a useful technology for various problems in basic sciences, engineering, and business applications. It tries to extract the dependency rule from a finite number of examples, with the ultimate aim of predicting the future unknown samples as accurate as possible. In order to seek models with good generalization abilities, effective criteria to evaluate the complexity of the model and guide model selection are indeed crucial topics in statistical learning research. It has long been recognized that the Structural Risk Minimization (SRM) principle based on the concept of VC-dimension provides an excellent means for complexity selection of a learning machine. Unfortunately, deriving analytic expression of the VC-dimension using its definition will be extremely difficult, except for a handful number of learning machines. As pointed out by V.N. Vapnik, the only practicable approach is to estimate this quantity from empirical experimentations. In this thesis, we propose a new method to estimate the VC-dimension through experimental procedures, which is not based on the supremum of risk discrepancies as used in the previous method. Experimental results on learning machines whose VC-dimensions are theoretically known show that the estimated results of the new method agree well with the theoretical true values. More importantly, it is able to be applied to more complicated learning machines where the previous method cannot be employed. In this study, we will use Classification And Regression Trees (CART) as an example to demonstrate, how the new method can estimate and evaluate the model complexity of CART with respect to the number of splits, and can assist the model selection (pruning) procedure of CART. Learning experiments on benchmark datasets show that, in eitherclassification or regression tasks, this novel strategy of tree model selection performs better than alternative methods, and approaches very close to the ideal performance that any tree pruning criterion can attain.
Keywords/Search Tags:statistical learning, complexity criterion, model selection, VC-dimension, classification and regression trees (CART)
PDF Full Text Request
Related items