Font Size: a A A

Statistical methods for genome-enabled prediction of quantitative traits

Posted on:2012-08-30Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Long, NanyeFull Text:PDF
GTID:1460390011961725Subject:Biology
Abstract/Summary:
Genome-enabled prediction of quantitative traits (or genomic selection) uses genome-wide dense markers (typically single nucleotide polymorphisms or SNPs) to predict genetic values of individuals. In this context, this dissertation studies the following problems, with an emphasis on methodological aspects. (1) Prediction of genetic values that contain non-additive effects, in addition to additive effects, is relevant to optimizing performance of future generations. However, this has been less well-studied than the problem of breeding value prediction. We propose to use nonparametric radial basis function (RBF) regression, a method which implicitly estimates all types of effects without increasing model dimensionality. Extensive simulation studies demonstrate the superiority of RBF regression in predicting genetic values that contain non-additive components, compared to additive parametric regression methods. (2) With several cycles of selection performed, how will the performance (e.g., accuracy of selection) of genomic selection change? By simulation, we characterize the long-term behavior of genomic selection under directional or random selection, and examine also the effect of marker density. Additionally, a simple method is used to elucidate the contribution to selection accuracy accruing from family relationships. (3) Incorporating advanced machine learning techniques is essential for genomic studies, especially in an era with a massive amount of high-throughput data. We investigate two support vector regression models in their ability to predict milk yield in Holsteins and grain yield in wheat using dense markers. In particular, the impact of kernel on model's predictive ability is shown to be data-dependent. (4) An alternative to regression on a large number of SNPs via Bayesian shrinkage methods is reducing data dimension by using latent variables. Meanwhile, selection of a subset of SNPs can also be performed. Two modified latent variable regression methods (supervised principal component regression and sparse partial least squares regression) are used to achieve both dimension reduction and variable selection. Through applications to milk yield prediction in Holsteins, we demonstrate their potential for accurate and cost-effective prediction of genomic breeding values.
Keywords/Search Tags:Prediction, Genomic, Selection, Methods, Regression, Values
Related items