Font Size: a A A

A Study On The Protein Secondary Structure Prediction And The Connection Between Protein Secondary Structure And Its 3D Structure

Posted on:2009-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y E FengFull Text:PDF
GTID:1100360245987012Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The knowledge of the structure of a protein is important to understand its function. With the success of human genome project, a widening gap appears between rapidly increasing known protein sequences and slow accumulation of known protein structures. Currently, the main methodologies for high-resolution protein structure determination in experimentation have been available, such as X-ray crystallography, NMR, electron microscopy etc. However, purely experimental approaches for the determination of protein structure are time-consuming and expensive. Thus, the theoretical or computational methods for predicting the structures of proteins become increasingly important.Presently, the direct prediction of the protein three-dimensional (3D) structure from its amino acid sequence is a difficult task. A large number of approaches have been developed to predict protein secondary structure. Protein secondary structure prediction is often looked as the first step for understanding and predicting tertiary structure because secondary structure elements constitute the building blocks of the folding units. So, the prediction of protein secondary structure as an intermediate step plays an important role in tertiary structure prediction.In this dissertation, we introduce a novel sequence-based method, namely tetra-peptide-based increment of diversity with quadratic discriminant analysis (TPIDQD for short), for protein secondary structure prediction in two different dataset. Moreover, we investigate the connection between protein secondary structure and its 3D structure for 325 proteins.(1) The proposed TPIDQD method consists of three steps: firstly, using the frequency of three kinds of tetra-peptide structural words occurring in a sequence fragment as diversity; secondly, using the method of increment of diversity combined with quadratic discriminant analysis (IDQD for short) to predict the structure of central residues for a sequence fragment; finally, making the correction to the IDQD prediction: removing the structure fluctuation and correcting the structure boundary by using tetra-peptide boundary words.(2) The proposed TPIDQD method is based on tetra-peptide structural words and used to predict the structure of central residue for a sequence fragment. The three state overall per-residue accuracy (Q3) has attained 79.19% in the three-fold cross-validated test for 21-residue fragments in CB513 dataset(3) An enlarged dataset is constructed, which contains 1645 protein chains with higher resolution than 3 Angstroms and lower identity than 25%. The TPIDQD method is tested in 1645 protein dataset and a higher accuracy is obtained. The three state overall per-residue accuracy (Q3) is 79.68% in the ten-fold cross-validated test for 21-residue fragments. And the accuracy can be further improved as taking long-range sequence information (>21-residue fragments) into account in prediction. Moreover, the accuracy Q3 has attained 79% in the independent test set with the increase of structural words.(4) We have investigated the relation between protein secondary structure and its 3D structure for 325 samples and obtained a better result.
Keywords/Search Tags:Protein secondary structure prediction, Tetra-peptide structural words, Increment of diversity, Quadratic discriminant analysis, Long-range Interaction, Generalization secondary structure sequence, 3D distance, Coefficient of correlation
PDF Full Text Request
Related items