Font Size: a A A

Statistical methods for high-dimensional genomic data

Posted on:2010-05-29Degree:Ph.DType:Thesis
University:Harvard UniversityCandidate:Wu, Michael Chiao-AnFull Text:PDF
GTID:2440390002479812Subject:Biology
Abstract/Summary:
High-throughput genomic studies hold great promise for providing insight into key biological and medical problems, but the high-dimensionality of the data from these studies constitutes a great challenge for researchers. This thesis seeks to address some of the methodological challenges posed by high-dimensional genomic data. First, the need to develop accurate classifiers based on genomic markers motivated the development of sparse linear discriminant analysis (sLDA), a regularized form of linear discriminant analysis, which performs simultaneous classification and variable selection. The second and third chapters of this thesis are concerned with multifeature testing. In the gene expression setting, we apply sLDA to test for differential expression of gene pathways by using the sLDA weights to reduce each pathway to a univariate score which may be evaluated via permutation. Then for genome wide association studies, we consider using the logistic kernel machine based testing framework to evaluate the significance of SNPs grouped on the basis of proximity to known genomic features. Finally, in the last chapter we study the use of sparse regularized regression for making inference in high dimensional data. Specifically, we develop a parametric permutation test based on the LASSO estimator for testing the effect of individual markers in "omics" settings.
Keywords/Search Tags:Genomic, Data
Related items