Font Size: a A A

Integration of Machine Learning, Network Science and Pathway Analysis in Genetic Epidemiology

Posted on:2015-10-06Degree:Ph.DType:Dissertation
University:Dartmouth CollegeCandidate:Pan, QinxinFull Text:PDF
GTID:1478390017491723Subject:Biology
Abstract/Summary:
Although genome-wide association studies (GWAS) and other high-throughput initiatives have led to an information explosion in human genetics and genetic epi- demiology, the mapping from genotype to phenotype remains challenging as most of the identified loci have only moderate effect size. As a ubiquitous phenomenon, epistasis is believed to account for a portion of the presumed missing heritability. The term epistasis refers to the non-additive effect among multiple genetic variants. To detect epistasis, machine learning methods have been developed and among them Random Forest (RF) is a popular one. Meanwhile, networks have emerge as a pop- ular tool for characterizing the space of pairwise interactions systematically, which makes it a well-suited framework for modeling interactions. Different with machine learning methods that identify risk-associated genes, pathway analysis highlights risk- associated pathways, which possess higher explanatory power. However, most extant pathway analysis methods ignore epistasis and treat each pathway independently. Here we integrate machine learning, network science, and pathway analysis to de- tect epistasis and address epistasis in pathway analysis. This work includes guiding random forest using interaction network for epistasis detection, examining the sig- nificance of epistasis in pathway analysis, developing pathway analysis approaches that take epistasis into account, and identifying risk-associated pathway interactions. Applications to population-based genetic studies of bladder cancer and Alzheimer's disease demonstrate the validity and potential.
Keywords/Search Tags:Pathway, Genetic, Machine learning, Epistasis, Network
Related items