Font Size: a A A

Advanced protein sequence analysis methods for structure and function prediction

Posted on:2006-06-18Degree:Ph.DType:Dissertation
University:University of DelawareCandidate:Kahsay, Robel YFull Text:PDF
GTID:1450390005495302Subject:Computer Science
Abstract/Summary:
As a result of rapid accumulation of genomic data from high-throughput genomic projects, scientists face an enormous task of characterizing each protein encoded by these genomes in order to understand how these proteins function in making up a whole living cell. Expensive and time consuming, in some cases even inapplicable, experimental approaches for verifying information on protein structure and function have motivated the development of computational methods for reliable and large-scale characterization of proteins.; In this dissertation, my research contribution in two major areas of protein sequence analysis is presented. The first contribution is in the field of protein sequence comparison in which I have developed a heuristic approach for comparison of profile hidden Markov models based on their quasi-consensus sequences. This method, referred to as QC-COMP, is shown to be significantly faster and more accurate than COMPASS---the existing state of the art method. On a related project, I have built a web based benchmark facility server for the Critical Assessment of Sequence alignment Accuracy.; As a second contribution, I have developed an improved hidden Markov model for topology prediction and identification of integral membrane proteins. The resulted program (TMMOD) is a systematic modification of an existing model (TMHMM) addressing key performance issues. In accuracy performance benchmark experiments, TMMOD is shown to have significantly improved results. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD has an accuracy of 84% for topology and 89% for locations.; Most computational methods for transmembrane protein topology prediction rely on compositional bias of amino acids to locate those hydrophobic domains in transmembrane proteins. Since signal peptides also contain hydrophobic segments, these computational prediction methods mistakenly identify signal peptides as transmembrane proteins. The SVM-Fisher discrimination approach was applied to further improve the ability of TMMOD to identify signal peptides as negatives. Using the SVM-Fisher discrimination method, mis-prediction of signal peptides as membrane proteins was reduced by more than a third.
Keywords/Search Tags:Protein, Prediction, Method, Signal peptides, TMMOD, Function
Related items