Font Size: a A A

Statistical and machine learning frameworks for economics: Analysis of error curves and applications to derivatives pricing and credit risk assessment

Posted on:1999-09-28Degree:Ph.DType:Dissertation
University:Harvard UniversityCandidate:Galindo-Flores, JorgeFull Text:PDF
GTID:1468390014968890Subject:Statistics
Abstract/Summary:
By providing a new approach to the problem of Statistical Inference, Machine and Statistical Learning methods have introduced a novel perspective to search for functional relationships in data. Examples of Machine and Statistical Learning methods are decision trees, neural networks, vector machines, k-nearest neighbor classifiers, genetic algorithms, multivariate adaptive regression splines, projection pursuit, etc. The motivation of this dissertation is to explore the limitations and potential of the different methods and, in particular, those based in machine learning techniques. The objective of the study is twofold: on the one hand, it attempts to define specific frameworks to make comparative studies of different statistical and machine learning methods in the context of regression and classification analysis; on the other, it takes specific known economic problems and applies these frameworks using the different algorithms. Chapter 1 explains the methodology which is based on the study of the error curves--the behavior of the error when varying the sample size and the capacity (degrees of freedom) of each analytical method. This methodology provides insight of the problem and allows us to address such questions as: how noisy is the data set? what is the trade-off between error rate and sample size? how can we overcome the overfitting problem? what is the best technique?;We test the methodology in different environments: Chapter 2 in the context of univariate regression analysis by approximating polynomials. Chapter 3 increases the complexity of the problem and tests the methodology in the context of multivariate regression analysis by recovering the Black-Scholes call option pricing formula with noise from simulated data. Chapter 4 makes a comparative classification analysis in a real mortgage data set. In all the exercises we find that the empirical forms of the error curves are similar to the ones implied by the theory: (1) as the sample size increases, ceteris paribus, the train (in-sample) error curve increases and the test (out-of-sample) error curve decreases and, (2) as the capacity (size) of the model increases, the error decreases, achieves a minimum, and then increases while keeping fixed the sample size. This suggest a procedure to find the optimal capacity (size) of the model that neither under- nor over-fits the problem. The error curve with optimal capacity for each technique is the base for the comparative analysis.
Keywords/Search Tags:Error, Statistical, Machine, Problem, Learning methods, Sample size, Frameworks, Capacity
Related items