Font Size: a A A

Research On Local Protein Fragment Constructural Properties’ Prediction Based HMM Methods

Posted on:2014-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y YuFull Text:PDF
GTID:1260330422954193Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the successful completion of the human genome project, the predictions forprotein structure and function have been become one of the most important challengesin computational biology research. How to predict the three-dimensional proteinstructure from amino acid sequence, which has not only important scientificsignificance, but also great application values in the field of medical and biologicalengineering. It will has a profound impact on the reveal of human life mystery. Atpresent, most of developed prediction techniques are various kinds of data mining andmachine learning methods based on knowledge, which used structural fragments asbuilding blocks for the assembly of protein fragments. But in fact, the limited numberof building blocks have certain discrete attributes, wich can’t cover the wholeconformational space of protein structure. The sampling bottleneck also exist in thesearch of protein conformational space, especially in continuous protein backboneangle space, which results in the loss of precition accuracy. Therefore, how to treatand deal with protein local structural fragments, how to recognize local fragments andsample their backbone conformations correctly, will be key to further improve theprediction accuracy for protein structure prediction.This dissertation is focused on the prediction of local protein structural fragment.Based on the modeling and real-value prediction for its two important structuralproperties: backbone torsion angle (BTA) and relative solvent accessibility surface(RSA), the structure conformation of protein fragment is successfully constructed.Then, the fragment structures are assembled to form protein tertiary structure. Arelatively complete protocol framework and an integrated prediction system for thestructural properties, are presented respectively. In this paper, through the study of the distribution characteristics of protein backbone torsion Angle (Φ,ψ), a mixture modelfor BTA with bivariate cosine distribution, and an improved HMM model for proteinstructural properties prediction are proposed. In the application of the model, abacktrack dynamic sampling algorithm is developed to get the structure conformationof protein fragment. On this basis, a protocol framework is suggested for theidentification and location of protein motif patterns, as well as the modeling andprediction of corresponding motifs. A scanning algorithm of sliding window withvariable length was developed for the identification of structural motif fragment.Finally, the HMM model is further improved, and an integrated system for real-valueprediction of protein structure properties is presented to obtain RSA and BTAreal-value of protein fragment at the same time. The main research contribution liesin:1) According to the structural characteristics of protein motif fragment, animproved Hidden Markov model is presented. The model is intended to establish akind of state for each protein structural properties. Each of hidden node in the modelrepresents the a residue specific position in local structural fragment chain, and it isclosely associated with four kinds of emission nodes with their own definiteprobability distribution. Several algorithms are given to solve the evaluation,decoding and learning issues of the improved model. According to the correspondingprobability distribution of backbone torsion Angle BTA and relative solventaccessibility RSA, the improved HMM model can better catch the relevance anddependence of structural properties between two adjacent residues in protein motiffragment.2) According to the preference information and distribution characteristics ofbackbone torsion Angle (Φ,ψ) in protein motif fragment, a mixture model of bivariatecosine distribution is proposed to model the angle correlation of (Φ,ψ) for eachresidue. An expectation maximization (EM) algorithm is used to estimate the mixturemodel parameter. Instead of dividing angles into several interval states arbitrarily, the probability density function of backbone torsion angle (Φ,ψ) is described as acontinuous directional statistical distribution, which avoids the angle discretizationused by a lot of other traditional methods. In a continuous (Φ,ψ) space, a sequence ofdihedral angle pairs of (Φ,ψ) is used to describe the backbone conformation ofprotein motif fragment. The improved HMM model and the backtrack dynamicsampling algorithm are used to ensure an unbiased conformational sampling in (Φ, ψ)space.3) The discrete attribute of the building blocks (BBs) is inconsistent to thecontinuous properties of protein backbone conformation. For this problem, aprobabilistic conformation sampling method based on the improved HMM model ispresented. It uses the protein amino acid sequence and its corresponding secondarystructure information as input of model, and samples the backbone conformation offragment in a continuous (Φ,ψ) space. At the same time, a new backtrack dynamicsampling algorithm is developed to apply to the HMM model, and to catch alldependencies of structural attributes between two adjacent residues in proteinfragment chains. The suggested method can repeatedly sample some similar naturalstructure conformations of fragment, and has well solved the bottleneck problem ofprotein conformation searching in continuous (Φ,ψ) space. On the optimal path ofmodel, some well-known protein structural motif fragment can be well reproduced.4) According to the fragment assembly process of local structure in proteinstructure prediction, a protocol framework for local protein structure prediction isproposed to follow the hierarchical organization of protein structures topology. Theframework searches and locates the structural motif fragments along the query aminoacid sequence, and samples its corresponding (Φ,ψ) in the continuous conformationspace. Then, the structure conformation of motif fragments are constructed for proteintertiary structure assembly. The framework is divided into two major parts: proteinmotif patterns identification and location, and corresponding motifs modeling andprediction. In the recognition process, a scanning algorithm of sliding window with variable length is developed for structural motif fragment identification. The length ofsliding window varies from the initial value7to the maximum value19. The querysequence fragment is matched to each of82kinds of standard motif pattern with samelength. This framework can be used as service foundation for better protein tertiarystructure prediction.5) For most of the protein structure attributes are continuous variable, anintegrated system for real-value prediction of local protein structure properties isdeveloped. The real-values of two kinds of structure properties of protein fragment arepredicted at the same time: RSA value and BTA value. It replaces the predictionmethods that structural attributes were arbitrarily classified into several definitionalstates. The former HMM model is further improved. The state transition probabilitymatrix of current hidden node in the improved model dependents on the precedinghidden node state and the combination observation of preceding emission nodes. Thestate sequence is still a Hidden Markov chain. According to the probabilitydistribution of RSA and BTA, it can better catch the relevance and dependence ofstructural properties between the two adjacent residues in fragment. Three main issuesof the improved model are further deduced.Some studies in this dissertation provide effective solutions for protein structuralproperties prediction, structure conformational sampling, identification and locationof protein motif fragment, improvement of fragment assembly accuracy, which can beused as the intermediate step for better protein tertiary structure prediction.
Keywords/Search Tags:Protein structure prediction, structural motif, backbone torsion angle, relative solvent accessibility, conformation sampling, fragment assembly, HiddenMarkov model, real-value prediction
PDF Full Text Request
Related items