Font Size: a A A

Model selection methods for genome wide association studies and statistical analysis of RNA seq data

Posted on:2013-03-01Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Srivastava, SudeepFull Text:PDF
GTID:1453390008473330Subject:Biology
Abstract/Summary:
Genome-wide association studies are important tools to reconstruct the genotype phenotype map to understand the underlying genetic architecture of complex traits. This enables us to better understand the genetic architecture of these phenotypes. With the advances in genotyping and high throughput sequencing technologies, millions of markers can be identified for individual populations in very short durations of time. Due to the multiple loci control nature of complex phenotypes, there is great interest to test markers simultaneously instead of one by one. In chapter 2, we compare three model selection methods for genome wide association studies using simulations: the Stochastic Search Variable Selection (SSVS), the Least Absolute Shrinkage and Selection Operator (LASSO) and the Elastic Net. We apply the three methods to identify genetic variants that are associated with daunorubicin-induced cytotoxicity. We also compare the LASSO and the SSVS to a dataset of two quantitative phenotypes related to Rheumatoid Arthritis.;In the second part of the dissertation, a two parameter generalized Poisson(GP) model to analyze RNA Seq is proposed. Deep sequencing of RNAs (RNA-seq) has been a useful tool to characterize and quantify transcriptomes. However, there are significant challenges in the analysis of RNA-seq data, such as how to separate signals from sequencing bias and how to perform reasonable normalization. In chapter 4, we used the generalized Poisson model to separate out the "true" expression level from the bias. We show that the GP model fits the data much better than the traditional Poisson model. Based on the GP model, we can improve the estimates of gene or exon expression, perform a more reasonable normalization across different samples, and improve the identification of differentially expressed genes and the identification of differentially spliced exons. We also use a likelihood based approach to estimate the expression levels of transcripts using the GP model discussed in chapter 5.;RNA Sequencing and genome wide associations studies have led to a rapid growth in understanding of complex genetic phenotypes and diseases. These two methods are crucial tools in the genomic age in the fields of molecular biology, genomics, population and quantitative genetics etc. Using these tools effectively, with the help of statistical and algorithmic methods, would lead to a rapid growth of knowledge in these fields and in the overall field of biology.
Keywords/Search Tags:Association studies, Methods, Model, Genome wide, RNA, Selection, Genetic
Related items