Font Size: a A A

A computational genomics study: Characterizing genomic variants in non-coding regions of the human genome

Posted on:2013-02-10Degree:Ph.DType:Thesis
University:Yale UniversityCandidate:Mu, XinmengFull Text:PDF
GTID:2450390008977106Subject:Bioinformatics
Abstract/Summary:
Advances of next-generation sequencing technology paved way for comprehensively cataloguing genomic variants from whole-genome sequencing of personal genomes. At the same time, novel methods also emerged for systematically detecting functional elements using high-throughput functional genomics approaches, such as chromatin immunoprecipitation followed by sequencing. Among these genomic elements, functional relevance of those that reside in non-coding regions of the genome is poorly understood, despite the fact that vast majority of the genome is non-coding and functionally important. In this thesis, we aim at systematically characterizing genomic variants in non-coding elements of the human genome using computational genomics approaches. We first characterized genomic variants, focusing on a class of least studied variants, structural variants (SVs), at single nucleotide resolution. We built a computational tool, BreakSeq, which leverages sequences at the SV breakpoints for SV re-identification in personal genomes, formation mechanism classification, ancestral state analyses and feature computations. Analysis using our BreakSeq tool revealed the extent, distribution, and mutational landscapes of SVs in the human genome. Next, to shed light on functional relevance of non-coding genomic elements, we carried out an integrative analysis of a full spectrum of genomic variants, including single nucleotide polymorphisms (SNPs), short deletions and insertions, and SVs, in non-coding elements in human. We used population-based metrics on genomic variant patterns to measure selective constraints in various classes and subclasses of non-coding elements, including transcription factor (TF) binding sites, non-coding RNAs, non-coding gene regions, and pseudogenes. We found that selection pressure on non-coding elements is less than that on coding sequences and more than that on neutral sequences. We also found differential selective constraints in element subclasses grouped by genomic properties. To further probe intra-element differences, we developed an element-aware aggregation procedure for SNP and indel diversity, which is built on the principle of block bootstrapping. Moreover, SVs were examined by permutation tests for interactions with functional element through different modes and formation mechanisms. Lastly, we investigated mutational patterns in the context of a TF-regulatory network. We found that the more highly connected TFs in the network tend to be more sensitive to sequence changes. Interestingly, we further found that TF-binding sites that show allele-specific behavior tend to be under relaxed constraints than the other TF-binding sites.
Keywords/Search Tags:Genomic variants, Non-coding, Genome, Human, Regions, Computational, Found
Related items