Font Size: a A A

A Protein Domain-Centric View of Human Genetic Disease

Posted on:2017-04-30Degree:Ph.DType:Dissertation
University:University of Maryland, Baltimore CountyCandidate:Peterson, Thomas AndrewFull Text:PDF
GTID:1454390008950625Subject:Bioinformatics
Abstract/Summary:
Variations and similarities in our genomes are part of our history, our heritage, and our identity. Some variants are associated with traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Several resources collect and store human variations in easily accessible databases, but remain incomplete due to our limited knowledge of how variants impact human health. In recent years, the low cost of sequencing technologies coupled with increasing computational power has enabled computational methodologies to predict which human genetic variants will be involved with disease. This is crucial for the analysis of rare or de novo variants, for which frequency-based analyses are often unsuccessful, and for somatic variants in sequenced tumor samples of unknown significance. However, most methods for predicting deleterious variants only predict whether a variant is deleterious, but do not identify the causative molecular disruption. Moreover, traditional methods are 'gene-centric' in that they focus on specific genes and make no comparison to paralogs or to genes sharing a common protein domain, which may contain relevant information regarding the gene family's role in disease. In this work, we seek to develop tools and computational methods for a protein domain-centric view of genetic variants to provide insights into the molecular perturbations at conserved structural and functional residues involved with human disease. Firstly, we present Domain Mapping of Disease Mutations (DMDM), a web-based tool designed to visualize genetic variants involved with disease for genes, proteins, and protein domains. Secondly, in the 'Domain Landscapes of Somatic Mutations in Cancer', we show the power of utilizing protein domain regions in the analysis of somatic variants to identify which sub-regions within a gene are of importance and highlight the cancer type specific heterogeneous mutation signatures for domain regions within the same gene. Finally, we present three studies that align genes via common protein domains to identify clusters of variants that occur more frequently at specific domain positions than expected by chance, or 'protein domain hotspots'. The first of these studies finds that human disease variants cluster at specific positions in protein domains, which overlap significantly with conserved and functional sites. In the second study, we find that this property is shared with genetic variants in yeast that are phenotypically altering and draw similarities between yeast and human mutational patterns. Finally, in application to cancer, we identify protein domain hotspots in somatic tumor samples using a new statistical correction needed for population-level data. Here, we define "oncodomains" as families of protein domains in which somatic variants from one or more genes form a hotspot. We show that not only do somatic variants form hotspots, but that the location and intensity of the hotspots can be heterogeneous between cancer types and that the protein domain framework is ideal for assessing the functional significance of rare variants in cancer. The novel methods described in this work provide insight into the molecular perturbations leading to disease by utilizing the structural and functional framework of protein domains. We expect these tools to be complementary to existing methods due to their unique ability of comparing similar functional disruptions of not only several human genes in the same family.
Keywords/Search Tags:Human, Protein, Variants, Disease, Gene, Functional
Related items