Font Size: a A A

Bioinformatics methods for the analysis and interpretation of DNA and protein structure

Posted on:2003-11-13Degree:Ph.DType:Dissertation
University:University of California, IrvineCandidate:Baisnee, Pierre FrancoisFull Text:PDF
GTID:1460390011982365Subject:Computer Science
Abstract/Summary:PDF Full Text Request
This bioinformatics dissertation focuses on DNA and protein sequence analysis. We develop new sequence-based computational methods to investigate the structural or compositional properties of biological macromolecules.; We first develop a general framework for sequence analysis based on additive scales, structural or other. The framework addresses the following issues (1) construction of sequences with extremal properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length N of the sequences increases; and (5) complete analysis of correlations between scales. The framework is applied to the analysis of DNA tandem repeats, using existing di- and tri-nucleotide scales that capture various aspects of DNA structure, including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference. We derive exact expressions for counting the number of repeat-unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.; We then show that the genetic code generally allows for the superimposition of any DNA structural signal onto any protein-coding sequence, through amino acid substitution. Structural scales might thus usefully complement pure-sequence analysis in motif detection. Only punctual, loosely positioned signals can be freely superimposed to conserved amino acid sequences.; Using Markov models and genome-wide computations, we next measure and characterize the compositional symmetry observed between complementary DNA strands at orders 1–9. We establish the universality and variability range of strand symmetry. We show that symmetry emerges from the combined effects of a wide spectrum of mechanisms operating at multiple orders and length scales.; Lastly, we develop methods to identify and characterize an under-recognized form of interaction between protein chains, which is mediated by β-sheet formation and is central to healthy biological function and diseases ranging from AIDS and cancer to Alzheimer's and Huntington's diseases. We describe a database of such interchain β-sheet interactions within entries in the Protein Data Bank and corresponding likely macromolecules. An index quantifies the strength of the interactions.
Keywords/Search Tags:DNA, Protein, Methods, Structural
PDF Full Text Request
Related items