Font Size: a A A

State-of-the-art protein secondary-structure prediction using a novel two-stage alignment and machine-learning method

Posted on:2009-07-14Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Gates, Ami MFull Text:PDF
GTID:1440390002493846Subject:Computer Science
Abstract/Summary:
While the complexity of biological systems often appears intractable, living organisms possess an underlying correlation derived from their hierarchical association. This notion enables methods such as machine learning techniques, Bayesian statistics, nearest neighbor, and known sequence-to-structure exploration, to discover and predict biological patterns.;As proteins are the direct expression of DNA, they are the center of all biological activity. Thousands of new protein sequences are discovered each year, and knowledge of their biological importance relies on the determination of their folded or tertiary structure. Secondary structure prediction plays an important role in protein tertiary prediction, as well as in the characterization of general protein structure and function.;The protein secondary structure prediction problem is defined as a three-state classification problem. Given any linear sequence of one-letter coded amino acids, the goal is to predict the secondary structure membership of each amino acid.;Machine-learning based techniques are commonly and increasingly used for secondary structure prediction. For the past few decades, several algorithms and their variations have been used to predict protein secondary structure, including multi-layered neural networks and ensembles of support vector machines.;DARWIN is new protein secondary structure prediction server that utilizes a novel two-stage system that is unlike any current state-of-the-art method. DARWIN specifically responds to the issue of accuracy decline due to a lack of known homologous sequences, by balancing and maximizing PSI-BLAST information, by using a new method termed fixed-size fragment analysis (FFA), and by filling in gaps, ends, and missing information with an ensemble of support vector machines. DARWIN comprises a unique combination of homology consensus modeling, fragment consensus modeling, and support vector machine learning. DARWIN has been tested against several leading prediction servers and results show that DARWIN exceeds current state-of-the-art accuracy for all explored test sets.
Keywords/Search Tags:Prediction, Protein secondary, DARWIN, State-of-the-art, Biological
Related items