Statistical and machine learning frameworks for economics: Analysis of error curves and applications to derivatives pricing and credit risk assessment

Posted on:1999-09-28

Degree:Ph.D

Type:Dissertation

University:Harvard University

Candidate:Galindo-Flores, Jorge

Full Text:PDF

GTID:1468390014968890

Subject:Statistics

Abstract/Summary:

By providing a new approach to the problem of Statistical Inference, Machine and Statistical Learning methods have introduced a novel perspective to search for functional relationships in data. Examples of Machine and Statistical Learning methods are decision trees, neural networks, vector machines, k-nearest neighbor classifiers, genetic algorithms, multivariate adaptive regression splines, projection pursuit, etc. The motivation of this dissertation is to explore the limitations and potential of the different methods and, in particular, those based in machine learning techniques. The objective of the study is twofold: on the one hand, it attempts to define specific frameworks to make comparative studies of different statistical and machine learning methods in the context of regression and classification analysis; on the other, it takes specific known economic problems and applies these frameworks using the different algorithms. Chapter 1 explains the methodology which is based on the study of the error curves--the behavior of the error when varying the sample size and the capacity (degrees of freedom) of each analytical method. This methodology provides insight of the problem and allows us to address such questions as: how noisy is the data set? what is the trade-off between error rate and sample size? how can we overcome the overfitting problem? what is the best technique?;We test the methodology in different environments: Chapter 2 in the context of univariate regression analysis by approximating polynomials. Chapter 3 increases the complexity of the problem and tests the methodology in the context of multivariate regression analysis by recovering the Black-Scholes call option pricing formula with noise from simulated data. Chapter 4 makes a comparative classification analysis in a real mortgage data set. In all the exercises we find that the empirical forms of the error curves are similar to the ones implied by the theory: (1) as the sample size increases, ceteris paribus, the train (in-sample) error curve increases and the test (out-of-sample) error curve decreases and, (2) as the capacity (size) of the model increases, the error decreases, achieves a minimum, and then increases while keeping fixed the sample size. This suggest a procedure to find the optimal capacity (size) of the model that neither under- nor over-fits the problem. The error curve with optimal capacity for each technique is the base for the comparative analysis.

Keywords/Search Tags:

Error, Statistical, Machine, Problem, Learning methods, Sample size, Frameworks, Capacity

Related items

1	Biometric Recognition Based On Samll Sample Size Problem
2	Algorithms Research On Feature Extraction And Classifiers Of High-Dimensional And Small Sample Size Data
3	Statistical learning applied to transcriptional regulation in small N, large D domains
4	Sample size estimation with nonparametric methods for one sample location tests under clustered data
5	Developing and Evaluating Methods for Mitigating Sample Selection Bias in Machine Learning
6	Support Vector Machine Using In Face Recognition
7	Research On Feature Extraction Algorithms And Face Recognition In The Case Of Small Sample Size Problem
8	Research On Face Recognition Algorithm Based On Dictionary Learning
9	Research On Image Recognition Algorithms Based On Structure And Dictionary Learning
10	Study On Some Subspace Methods For Feature Extraction And Their Applications In Face Recognition