Font Size: a A A

On the SNP-based and Sequence-based Whole Genome Studies for Complex Traits

Posted on:2013-02-25Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Pongpanich, MonnatFull Text:PDF
GTID:1453390008987717Subject:Biology
Abstract/Summary:
Quality Control (QC) of the single nucleotide polymorphism (SNPs) used in genome-wide association studies (GWAS) is essential to minimize potential false findings. SNP QC commonly uses expert-guided filters to exclude low-quality SNPs. Expert filters aim to remove SNPs that fall into the extremes of QC variables including Hardy--Weinberg equilibrium, missing proportion (MSP) and minor allele frequency (MAF). However, implementations of these filters require arbitrary thresholds and do not jointly consider all QC features. We propose an algorithm that is based on principal component analysis and clustering analysis to detect low-quality SNPs. The method minimizes the use of arbitrary cutoff values, allows a collective consideration of the QC features and provides conditional thresholds contingent on other QC variables. We compare the performance of our method to expert filters on datasets from the Wellcome Trust Case Control Consortium and the Genetic Association Information Network. Our results suggest that with the same or fewer SNPs excluded, the proposed algorithm tends to give a similar or lower inflation factor of the test statistics (lambda), gives a reduced number of false associations, and retains all true associations.;GWAS methods that collapse information across genetic markers when searching for association signals are gaining momentum in the literature due to their usefulness in marker set analysis and identifying rare variants. Collapsing information can be done at the genotype level, which focuses on the mean of genetic information or the similarity level, which focuses on the variance of genetic information. We seek to understand the strengths and weaknesses of these two collapsing paradigms. Our results show that neither collapsing strategy outperforms the other across all simulated scenarios. The signal-to-noise ratio and the underlying genetic architecture of the causal variants are the two factors that dominate their performance. Genotype collapsing is more sensitive to the marker set being contaminated by noise loci than similarity collapsing. It performs best when the genetic architecture of the causal variants is not complex. Similarity collapsing is more robust and outperforms genotype collapsing when the genetic architecture of the markerset becomes more sophisticated such as causal loci with various effect sizes or frequencies. In addition, we consider a two-stage analysis that combines the two top-performing methods from different collapsing strategies and find that it is reasonably robust across all simulated scenarios.;RNA-Seq is a promising approach for understanding transcriptomes due to its accuracy, large dynamic range of expression level, and ability to detect novel transcripts. The first step prior to any data analysis is to map reads against a reference genome or transcript set using an alignment tool e.g., TopHat. In many experiments, a non-trivial number of reads are unmapped and excluded from down-stream analyses. To maximize the potential utility of sequenced reads, we propose a method of incorporating these unmapped reads in testing for differentially expressed (DE) genes. Specifically, we use BLAST to align the unmapped reads and assign a weight to each mapped read that reflects the mapping confidence. Gene expression is estimated from the summation of weights of the reads mapped to a gene. To test the general utility of the proposed approach, we construct a simple statistical method and show that using weights improves the power to detect DE genes while still controlling the false discovery rate. In addition, we examine the characteristics of the reads not mapped by TopHat and find that not only the beginning region of the reads, but the tail region of the reads also causes problems with the alignment.
Keywords/Search Tags:Reads, Snps, Collapsing
Related items