Font Size: a A A

A Study On Prediction Of Protein Fold Based On Machine Learning

Posted on:2011-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:H J GuoFull Text:PDF
GTID:2120360305476551Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
3D protein structure determines its biological function. Protein fold pattern is a rep-resentation of topology of protein structures. There are about more than 100,000 proteinstructures in nature while the number of types of protein folds is only over 1000. Therefore,prediction of protein fold is a significant scientific task and valuable for potential applica-tions.SCOP is a manually-classified database for protein folds. In this thesis, the informationabout the protein folds in SCOP are studied, and the relationship between amino acids prop-erties and folds are analyzed. Given amino acid sequence some classifiers for prediction ofprotein fold are developed.Two training sets are collected from SCOP database. Based on the biochemical knowl-edge about the amino acid, attributes are extracted from the training sets, such as the foldingpattern, length, frequency and hydrophobic properties of the amino acid sequence. Thecorresponding SVM classifiers are constructed, and rigorous open tests are conducted forevaluating the performance of the classifiers. The prediction results are further interpretedin terms of the topology information of the predicted protein structures.Oriented for the de novo prediction problem, the complete information of amino acidsequence is reserved as the attributes for training classifiers. Bayesian classifiers are em-ployed to predict protein fold. Both SVM and BN classifiers are used to get a consistentprediction results. Fold prediction results are applied to generate a fragment library withgood quality for de novo prediction of protein structure. With the help of fold prediction,the accuracy of the de novo prediction is improved.The experimental results presented in this thesis show that the prediction accuracy inour open testing sets can reach 84.6154% in the best case, and fold prediction based onmachine learning can provide valuable enhancement to de novo prediction.
Keywords/Search Tags:Protein fold prediction, SVM, Bayesian classifier, de novo prediction
PDF Full Text Request
Related items