Generalization error estimates and training data valuation

Posted on:2003-11-16

Degree:Ph.D

Type:Thesis

University:California Institute of Technology

Candidate:Nicholson, Alexander Marshall

Full Text:PDF

GTID:2460390011984516

Subject:Computer Science

Abstract/Summary:

This thesis addresses several problems related to generalization in machine learning systems. We introduce a theoretical framework for studying learning and generalization. Within this framework, a closed form is derived for the expected generalization error that estimates the out-of-sample performance in terms of the in-sample performance. We consider the problem of overfitting and show that, using a simple exhaustive learning algorithm, overfitting does not occur. These results do not assume a particular form of the target function, input distribution or learning model, and hold even with noisy data sets. We apply our analysis to practical learning systems, illustrate how it may be used to estimate out-of-sample errors in practice, and demonstrate that the resulting estimates improve upon errors estimated with a validation set for real world problems.; Based on this study of generalization, we develop a technique for quantitative valuation of training data. We demonstrate that this valuation may be used to select training sets that improve generalization performance. With a reasonable prior over target functions, it further allows us to estimate the level of noise in a data set and provides for detection and correction of noise in individual examples. Finally, this data valuation can be used to classify new examples, yielding a new learning algorithm that is shown to be relatively robust to noise.

Keywords/Search Tags:

Generalization, Data, Estimates, Training, Valuation

Related items

1	Generalization of isolated word training to connected text: A comparison of generalization strategies
2	Assessment of safety generalization and demonstration as a function of various training stimulus parameters
3	Generalization of creative tool improvisation training in preschoolers
4	Training caregivers to facilitate the generalization and maintenance of communicative behaviors in patients with Alzheimer's disease
5	Study On Line Feature Generalization And Its Data Processing
6	Study On Farmland Classification Data Generalization Based On GIS
7	Decomposition Of M-Valuations On A Commutative Ring
8	Technique And Practice Research On Land-use Data Generalization
9	Prolongations Of A Valuation On A Module
10	Research On The Methods Of Land-use Data Generalization Based On Arcengine