Font Size: a A A

Multiple alignments of protein structures and their application to sequence annotation with hidden Markov models

Posted on:2004-02-09Degree:Ph.DType:Dissertation
University:University of California, San DiegoCandidate:Scheeff, Eric DavidFull Text:PDF
GTID:1468390011970171Subject:Biology
Abstract/Summary:
The current flood of raw protein sequence generated from the various genomics projects, coupled with the steadily increasing number of experimentally determined protein structures, presents special challenges and opportunities to computational biology. Against this backdrop, several studies are presented, all of which rely at their core upon multiple structure alignments of proteins.; First, a manual structure alignment is created for representatives from the protein kinase-like superfamily, and an analysis is undertaken. It is demonstrated that only a small core region responsible for ATP interaction and catalytic activity is truly conserved across the whole of the superfamily, and that a very small number of residues are conserved within this core.; Second, the manual alignment is used to present the difficulties inherent in the production of highly accurate alignments using automated methods. In addition, the manual alignment is used to help improve automated alignments generated by a new method for multiple structure alignment, CEfam. It is demonstrated that it is possible to substantially improve output results generated by an automated alignment method by carefully benchmarking different parameter sets against a manual alignment “gold standard”. Alignments by CEfam for domains from the SCOP database are then used to generate a new Internet-accessible database, known as SASSY (Structural Alignments of SCOP Superfamilies).; Finally, the automated alignments generated by the properly tuned CEfam are applied to the problem of sequence annotation with hidden Markov models (HMMs) in a benchmarking experiment. Alignments of homologous sequence are built around each structure using an iterative HMM protocol tailored towards structural domains. Then, the structure alignment is used to generate a sequence alignment that represents an entire superfamily. This alignment is then used to train a structure linked alignment HMM (SLAHMM). It is shown that SLAHMMs can improve structure assignment performance at the superfamily level when used in combination with traditional iterated HMMs. Further, it is shown that SLAHMMs provide superior performance to standard HMMs in the correct assignment of fold-level structure similarities. It is concluded that SLAHMMs should be used as an additional method to improve structural annotation of unknown sequence.
Keywords/Search Tags:Sequence, Alignment, Structure, Protein, Annotation, Used, Multiple, Generated
Related items