Machine learning approaches to understanding the genetic basis of complex traits

Posted on:2010-01-05

Degree:Ph.D

Type:Thesis

University:Stanford University

Candidate:Lee, Su-In

Full Text:PDF

GTID:2448390002984148

Subject:Biology

Abstract/Summary:

Humans differ in many observable qualities, termed 'phenotypes', ranging from appearance to disease susceptibility. Many phenotypes are largely determined by each individual's specific 'genotype', stored in the 3.2 billion bases of his or her genome sequence. Deciphering the genome sequence by finding which sequence variations affect a certain phenotype would have a great impact on human life. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual's sequence information on a genome-wide scale. Classical approaches have focused on finding a significant correlation between a sequence variation S and a particular phenotype P from the genotype and phenotype data. However, it is difficult to directly infer such causal relationships between S and P from limited data, because of: (1) the complexity of cellular mechanisms, through which S causes P, and (2) environmental factors that are not necessarily measurable.;In this dissertation, we present machine learning approaches that address these challenges by explicitly modeling an intermediate process between the genotype and phenotype. More specifically, we model the genetic regulatory mechanisms that are induced by sequence variations and that lead to the phenotype, and we learn the model from genome-wide mRNA expression measurements. Using the learned model, we aim to generate a finer-grained hypothesis such as: a sequence variation S induces regulatory interactions R, which lead to changes in the phenotype P.;To achieve this goal, our approach utilizes sophisticated machine learning techniques that can robustly select relevant biological interactions among a large number of possible interactions and can efficiently solve the optimization problem from a large amount of data. For example, our 'meta-prior algorithm' can learn the regulatory potential of each sequence variation based on their intrinsic characteristics, and this improvement helps to identify a true causal sequence variation among a large number of variations in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Some of the machine learning techniques developed for biological problems are generally applicable to a wideranging set of applications such as collaborative filtering and natural language processing.

Keywords/Search Tags:

Machine learning, Phenotype, Approaches, Sequence

Related items

1	Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis
2	Applications Of Machine Learning Approaches To Biological Sequence Analysis
3	Arabic handwriting recognition using machine learning approaches
4	Machine Learning Approaches to Provide Spatio-Temporal Characterization of Human Brain Functional Activities
5	Machine learning approaches for dealing with limited bilingual training data in statistical machine translation
6	Data Analysis For High Content RNA Interference Screening: Pattern Recognition Approaches For Certain Systems Biology Application
7	Machine learning approaches for determining effective seeds for k-means algorithm
8	Research On Named Entity Extraction Method For Symptom Phenotype
9	Analysis And Application Of The Gene Semantic Similarity Based On Disease Phenotype
10	Efficient Large-Scale Machine Learning Algorithms for Genomic Sequence