Font Size: a A A

Predictio Model Building And Application Of Protein Fold Type

Posted on:2018-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y N JingFull Text:PDF
GTID:2310330563952706Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Protein as the bearer of life activities has always been the focus of research in the field of life sciences.The speed of protein structure analysis by experimental method is far behind the speed of protein sequence determined.Protein 3D structure prediction has become more and more important.A large number of studies have shown that although structure of protein is diverse,the number of folding type is limited.The data published by SCOP and CATH show that the number of fold type is more than a thousand.Fold type study will be simpler and more accurate than studying protein 3D structure.Protein fold recognition is the study about fold type.Protein fold recognition has been one of the methods for protein 3D structure prediction.This article focuses on the data of ?,?,?/? and ?+? proteins in the SCOPe database and establishes a method of protein fold recognition based on protein sequence.This method enhance the coverage rate of the samples that can be recognized.And the recognition accuracy is very high.The main work of this paper includes:1.Establish the family-HMMs and extended family-HMMsWe made multiple structure alignment for the families whosecystal structure number is more than one,extracted and used relative information to establish the family Hidden Markov Model(HMMs),and then formed family-HMMs set of protein fold type,referred to family-HMMs.Based on the family-HMMs form extended family-HMMs.2.Establish the superfamily-HMMs and extended superfamily-HMMsWe made multiple structure alignment for the superfamilies whose cystal structure number is more than one,extracted and used relative information to establish the family Hidden Markov Model(HMMs),and then formed superfamily-HMMs set of protein fold type,referred to superfamily-HMMs.Based on the superfamily-HMMs form extended superfamily-HMMs.3.Model recognition testFour test sets were constructed from different aspects based on the data in SCOPe-2.05 and SCOPe-2.06 databases.The four models were tested using these four test sets,the accuracy of the recognition was high.4.Automated identificationThis paper build up a database of protein folding types based on the four model sets.The functions include the ability to automatically identify the fold type of protein sequence to be tested and to update the model set.The coverage rates of family-HMMs set and superfamily-HMMs set for the recognized samples in the research are 86% and 68% respectively;the fold recognition accuracy rates reach 97% and 95% respectively for covered samples from the four classes of proteins.The coverage rate all reach 97% of extended family-HMMs set and extended superfamily-HMMs set,the fold recognition accuracy of the extended model are above 95% and 93%.This paper focuses on the data of ?,?,?/? and ?+? proteins in the SCOPe database,established four model sets: family-HMMs,extended family-HMMs,superfamily-HMMs,extended superfamily-HMMs.These four model sets can be used to protein fold recognition.This paper achieved the fold recognition automatically.The study of this paper has a high coverage and high accuracy of the sample.
Keywords/Search Tags:fold recognition, HMMs, family, superfamily
PDF Full Text Request
Related items