Font Size: a A A

Topics in statistical learning

Posted on:2010-02-12Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Hofling, HolgerFull Text:PDF
GTID:2448390002982913Subject:Statistics
Abstract/Summary:
In this thesis, we will be exploring several topics in the field of Machine Learning with special attention to applications on biological data.;In the first part, the pre-validation method is being analyzed. Given a predictor of outcomes derived from a high dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the correct nominal level and achieves roughly the same power as the analytical test.;The second part considers the problems of estimating the parameters as well as the structure of binary-valued Markov networks. For maximizing the penalized log-likelihood, we implement an approximate procedure based on the pseudo-likelihood as defined by Besag and generalize it to a fast exact algorithm. Our results show that this procedure is faster than a competing exact method. We also find that the approximate pseudo-likelihood is much faster than the exact methods and only slightly less accurate.;Finally, a path algorithm for the Fused Lasso, an extension of the Lasso model, is being developed. The Fused Lasso adds an L 1 penalty with parameter lambda2 on the difference of neighboring coefficients in the Lasso model, assuming there is a natural ordering. The algorithm calculates the whole solution path for the lambda2 penalty with lambda1 fixed. We also develop special versions for certain interesting cases that can be solved very efficiently.
Keywords/Search Tags:Topics, Special
Related items