Font Size: a A A

Analysis of high level patterns in genomic data: From protein thermostability to tumor biology

Posted on:2008-06-21Degree:Ph.DType:Thesis
University:University of California, Los AngelesCandidate:O'Connor, Brian DanielFull Text:PDF
GTID:2444390005959310Subject:Bioinformatics
Abstract/Summary:
Technological advancements, such as the release of the human genome in 2001, have sparked the development of methods allowing biologists to ask previously intractable questions. A prime example is the DNA microarray which enables the simultaneous monitoring of thousands of gene expression levels. Here, two bioinformatics projects that leveraged genome-scale proteomics and genomics datasets are presented. While the scientific questions examined in each were different, there was a common end goal of identifying high-level patterns across the data. The computational approaches used allowed for a better understanding of the specific mechanisms governing protein stability in thermophiles as well as gene expression in cancer.;In the proteomics study, protein sequences from nearly 200 microbial genomes were threaded onto known structures. The results supported a widespread use of structurally stabilizing disulfide bonds in intracellular proteins from most of the thermophilic bacteria and archaea. This surprising global pattern yielded additional insights into the possible mechanisms of disulfide maintenance with the identification of a protein disulfide oxidoreductase specific to these thermophiles.;The genomics study examined several cancer gene expression datasets for subtle relationships. This method identified pairs of genes whose binary expression states matched the sample labels well, such as prognostic categories or cancer subtypes, only when logically combined. This result was important because it linked expression to observable classes in a novel way, using genes ignored by the vast majority of analysis methods. The global pattern of gene expression and sample class relationships ultimately revealed possible novel mechanisms in the disease states examined.;Together these results highlight the importance of using information from rich, multiplexed datasets in order to understand nuanced patterns in biological systems. In both research projects, new insights were gained only when large numbers of either genomes or gene expression samples were compared. The computational challenges associated with such large bioinformatics studies are further explored in the final section of this thesis.
Keywords/Search Tags:Protein, Gene expression, Patterns
Related items