Font Size: a A A

Decoding the multifactorial nature of mutation rate variation in the human genome using computational and statistical approaches

Posted on:2013-08-05Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:Ananda, GuruprasadFull Text:PDF
GTID:1450390008472303Subject:Bioinformatics
Abstract/Summary:
Whole genome sequencing and resequencing projects have provided a rich source for studying mutations. There is now substantial evidence indicating regional variation and co-variation of rates of nucleotide substitutions, insertions, deletions, and repeat number alterations of microsatellites in the human genome. This knowledge has advanced our understanding of mutagenesis and has proved to be a useful resource for mining the functional genomics landscape. Despite this, a thorough characterization (structure, causes, geography, and implications) of mutation rate variation and co-variation is lacking. Moreover, although the rapid rise of next generation sequencing (NGS) data has enriched our understanding of substitutions, insertions, and deletions, microsatellites have largely been sidelined due to the technical difficulties associated with their identification and genotyping from NGS short-read data. This hinders our understanding of the mutational properties of microsatellites, which are among the most variable genomic sequences and implicated in numerous diseases.;In this dissertation, I develop and use reproducible bioinformatics- and statistical tools to study various facets of rate variation and co-variation of mutations along the human genome using whole genome primate alignments (Chapters 1 and 2) as well as to identify and model the mutational behavior of m icrosatellites in human populations using the 1000 Genomes Project data (Chapters 3 and 4). I ask and provide answers to the following detailed questions:;1. How do rates of different mutation types co-vary in the human genome? And what determines their co-variation? Rates of substitutions, short insertions and short deletions show strong linear co-variation and genomic landscape features influence the structure and strength of this covariation. Microsatellite mutability varies orthogonally.;2. Can we define and characterize typical states of mutation rate variation? identified six states with various combinations of elevated or depressed mutation rates --- these states differ in their prevalence, lengths, genomic locations, and associations with genomic landscape features, and influence the localization of genes and functional marks.;3. When does a short tandem repeat (TR) turn into a highly mutable microsatellite, and what factors influence this switch? Not only the absolute levels of polymorphism, but also the rate of exponential growth of polymorphism incidence with repeat number influences the propensity of a TR to turn into a microsatellite. The change points occur at repeat numbers 9, 5, 4, and 4 for mono-, di-, tri- and tetranucleotide TRs respectively.;4. What are the main sources of errors associated with identification and genotyping of TRs from short-read data? And are these errors influenced by a TR's intrinsic properties? PCR slippage errors might constitute a large part of TR-associated errors. Error rates are strongly influenced by intrinsic properties of TRs including (i) motif size, (ii) motif composition, and (iii) repeat number.;Results from this dissertation contribute to understanding the mechanisms and rate variations of multiple mutation types, and have important implications for identification and screening of functional elements and disease-causing mutations. Importantly, this dissertation contributes tools to the scientific community via Galaxy, an open-source genomics portal, and thus facilitates future large-scale genomics studies.
Keywords/Search Tags:Genome, Mutation, Using, Genomic
Related items