Research On Algorithms For Protein And RNA Tertiary Structure Prediction

Posted on:2023-08-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Y Du

Full Text:PDF

GTID:1520306797994189

Subject:Bioinformatics

Abstract/Summary:

PDF Full Text Request

Biological macromolecules like proteins and RNAs play critical roles in the living process.Since structure determines function,it is necessary to understand the tertiary structure before studying the mechanism and function of these biomacromolecules.However,traditional structure determination methods are usually laborious and time-consuming,which makes it urgent to develop computational methods for structure prediction.To this end,in this paper we mainly focus on the problem of protein and RNA structure prediction.More specifically,for protein structure prediction,we will introduce a homology modeling algorithm named DISthread,the web server and standalone package for a de novo folding algorithm tr Rosetta,and an assessment method APD for protein inter-residue distance prediction.As for RNA structure prediction,a template-based algorithm called IPro Align will be presented.In recent years,the increasing accuracy in protein inter-residue distance prediction opens a new avenue to improve the performance of homology modeling algorithms.To this end,we develop a threading algorithm called DISthread,which combines both 1-dimensional sequence information and 2-dimensional distance information.It first utilizes 1-dimensional information such as sequence profile,secondary structure and solvent accessibility to generate initial alignment,then iteratively combines predicted inter-residue distance into the scoring function.Benchmark tests on 4 independent datasets indicate that DISthread outperforms comparable algorithms which use sequence and contact information,such as HHpred,SPARKS-X,MUSTER,Eigen THREADER,map＿align and CATHER,and the other distance-based threading method Deep Threader.The harder the target protein is,the more improvement DISthread has over these methods.By comparing with baseline methods and analyzing predicted distances,we prove that the inclusion of 2-dimensional inter-residue distance information significantly improves the performance of our threading algorithm.The tr Rosetta server is a web-based platform for fast and accurate protein structure prediction.With the input of an amino acid sequence or a multiple sequence alignment,a deep neural network is first utilized to predict inter-residue distance and orientations.The predicted geometries are then transformed as restraints to guide the structure prediction on the basis of direct energy minimization,which is implemented under the framework of Rosetta.The tr Rosetta server distinguishes itself from other similar structure prediction servers in terms of rapid and accurate de novo structure prediction.For illustrative purpose,tr Rosetta was applied to two Pfam families with unknown structures,for which the predicted de novo models were estimated to have high accuracy.Nevertheless,to take the advantage of homology modeling,homologous templates are used as additional inputs to the network automatically.In general,it takes ～1 h to predict the final structure for a typical protein with ～300amino acids.To enable large-scale structure modeling,a downloadable package of tr Rosetta with open-source code is available as well.Significant progress has been achieved in distance-based protein folding,due to improved prediction of inter-residue distance by deep learning.Many efforts are thus made to improve distance prediction in recent years.However,it remains unknown what is the best way of objectively assessing the accuracy of predicted distance.To this end,we propose a method called APD for inter-residue distance prediction assessment.The assessment can be done in three different flavors: prediction-oriented,native-oriented and full-list,depending on the set of residue pairs being assessed.A total of 19 metrics were proposed to measure the accuracy of predicted distance.These metrics were discussed and compared quantitatively on three benchmark datasets,with distance and structure models predicted by the tr Rosetta pipeline.Experiment results show that a few metrics,such as distance precision,have a high correlation with the model accuracy measure TM-score(Pearson’s correlation coefficient >0.7).In addition,these metrics are applied to rank the distance prediction groups in CASP14.It turns out that the ranking by our metrics coincides largely with the official version.These data suggest that the proposed metrics are effective for measuring distance prediction.IPro Align is a homology modeling method for RNA tertiary structure prediction.It first simplies the linear arc diagram of RNA secondary structure,and aligns all base pairs by integer programming.The alignment of secondary structure is then combined into the scoring function for global alignment.During the above process,a sequence profile calculated from the preprocessed multiple sequence alignment of the query sequence is utilized for scoring function design.A benchmark test on the TE80 dataset illustrates that after excluding templates sharing over 40% sequence identity with the query RNA,IPro Align generates better alignments than Foldalign,which has similar coverage as IPro Align,and the improvement is more significant on RNAs with pseudoknots.Results on the PUZ30 dataset show that the model quality of IPro Align is better than Loc ARNA,CARNA,RNAmount Align,and Foldalign.By comparing different strategies of using multiple sequence alignment,we prove that IPro Align utilizes the multiple sequence alignment in an efficient way.Besides,the analysis on the accuracy of secondary structure prediction,alignment coverage demonstrates that the model quality of IPro Align is associated with both factors to some extent.By comparing the running time of searching against a non-redundant template library,we show that IPro Align is faster than other methods,and its speed is less affected by the sequence length.All the above results suggest that IPro Align is a fast and efficient algorithm for detecting homologous templates.

Keywords/Search Tags:

protein structure prediction, homology modeling, de novo folding, inter-residue distance, RNA structure prediction, multiple sequence alignment, secondary structure, integer programming

PDF Full Text Request

Related items

1	Research On Key Techniques For Protein Residue Contact And Distance Prediction
2	A Study On The Protein Secondary Structure Prediction And The Connection Between Protein Secondary Structure And Its 3D Structure
3	Research Of Secondary Structure And Residue-Residue Contact Based Protein Structure Prediction Method
4	Research On Protein Structure Prediction And Structure Alignment
5	Feature Extraction And Deep Learning Method For Protein Inter-residue Interaction Prediction
6	Homology-based protein structure prediction: Fold recognition and alignment
7	Protein Structure Alignment Based On Secondary Structure Elements
8	Research On RNA Secondary Structure Prediction Based On U-net Convolutional Neural Networks
9	Unifying framework for the prediction of protein folding pathways and tertiary structure from primary sequence
10	The Study On Atomic Distance-Dependent Pair-wise Statistical Potentials In Protein Structure Prediction