Font Size: a A A

Breeding value estimation and quantitative trait loci detection by machine learning methods based on high dimensional single nucleotide polymorphisms dataset

Posted on:2010-06-23Degree:M.ScType:Thesis
University:University of Alberta (Canada)Candidate:Wei, WeiFull Text:PDF
GTID:2443390002977287Subject:Computer Science
Abstract/Summary:
A Quantitative Trait Locus (QTL) is a region of DNA that is associated with a particular phenotypic trait. QTL mapping is the statistical study that relates the alleles that occur in a locus to the associated phenotypes. If we know the QTLs that affect the economically important traits in the breeding industry of dairy cattle, we could greatly improve the estimation of breeding values, which would in turn lead to more accurate selection of diary sires for breeding. With the advances in DNA chip technology and the discovery of thousands of single nucleotide polymorphisms (SNPs) in genome-sequencing projects, we can now identify the QTL associated with traits of interest based on SNP information.;We focus on a dataset from a diary-industry breeding program, where 1341 SNPs were genotyped for 462 dairy sires to predict 5 economically important traits. Our empirical results indicate that the average correlation between the prediction and the true value of these 5 traits is about 0.56 using GP, our best predictor. The results also suggest. that the performance of the two kernel methods is better than that of the other statistical methods, based on both correlation and root-mean square error performance criteria. However, the feature selection methods we tried failed to identify the most relevant SNPs for the traits in this dataset.;In this study, we consider the challenge of learning the QTL mapping for predicting important traits that are then turned into breeding values using the SNP dataset. This is especially challenging due to the high dimensionality of the dataset. We examine the use of two machine-learning kernel methods, Support Vector Machine (SVM) and Gaussian Process (GP), as well as several statistical methods---including partial least square regression (PLS) and LASSO. We also explore several feature selection techniques to identify the SNPs associated with the QTL affecting the traits for prediction, including correlation-based feature selection, logic regression, M5 prime for linear regression and haplotype blocks.
Keywords/Search Tags:QTL, Trait, Breeding, Methods, Feature selection, Dataset, Associated
Related items