Font Size: a A A

Mathematical programming for statistical learning with applications in biology and finance

Posted on:2010-07-24Degree:Ph.DType:Dissertation
University:Princeton UniversityCandidate:Luss, RonnyFull Text:PDF
GTID:1448390002982845Subject:Operations Research
Abstract/Summary:
The primary focus of this work is optimization algorithms for statistical learning tools and, in particular, the development and implementation of large-scale algorithms for sparse principal component analysis (PCA) and kernel optimization.;Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. We first enhance a recent first order algorithm for a semidefinite relaxation to sparse PCA using numerically cheaper approximate gradients, allowing us to work with larger data sets. These results are applied to some classic clustering and feature selection problems arising in biology.;We next examine classification problems, specifically using support vector machines (SVM), which are heavily dependent on the choice of an input kernel matrix. Kernel learning seeks to improve classification performance by minimizing an upper bound on test error over a set of kernel matrices. Current classification methods, such as SVM, require positive semidefinite kernel matrices. We first address this limitation using kernel learning to incorporate indefinite kernels into SVM, and describe algorithms to solve the resulting convex eigenvalue optimization problem. We then discuss algorithms for another recent kernel learning problem called multiple kernel learning (MKL), which learns kernels as convex combinations of predefined positive semidefinite matrices.;The final part of this work shows how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features in order to improve classification performance. Predictability is observed in the occurrence of abnormal returns, but not in their direction.
Keywords/Search Tags:Kernel learning, Algorithms, Classification
Related items