| Biological macromolecules like proteins and RNAs play critical roles in the living process.Since structure determines function,it is necessary to understand the tertiary structure before studying the mechanism and function of these biomacromolecules.However,traditional structure determination methods are usually laborious and time-consuming,which makes it urgent to develop computational methods for structure prediction.To this end,in this paper we mainly focus on the problem of protein and RNA structure prediction.More specifically,for protein structure prediction,we will introduce a homology modeling algorithm named DISthread,the web server and standalone package for a de novo folding algorithm tr Rosetta,and an assessment method APD for protein inter-residue distance prediction.As for RNA structure prediction,a template-based algorithm called IPro Align will be presented.In recent years,the increasing accuracy in protein inter-residue distance prediction opens a new avenue to improve the performance of homology modeling algorithms.To this end,we develop a threading algorithm called DISthread,which combines both 1-dimensional sequence information and 2-dimensional distance information.It first utilizes 1-dimensional information such as sequence profile,secondary structure and solvent accessibility to generate initial alignment,then iteratively combines predicted inter-residue distance into the scoring function.Benchmark tests on 4 independent datasets indicate that DISthread outperforms comparable algorithms which use sequence and contact information,such as HHpred,SPARKS-X,MUSTER,Eigen THREADER,map_align and CATHER,and the other distance-based threading method Deep Threader.The harder the target protein is,the more improvement DISthread has over these methods.By comparing with baseline methods and analyzing predicted distances,we prove that the inclusion of 2-dimensional inter-residue distance information significantly improves the performance of our threading algorithm.The tr Rosetta server is a web-based platform for fast and accurate protein structure prediction.With the input of an amino acid sequence or a multiple sequence alignment,a deep neural network is first utilized to predict inter-residue distance and orientations.The predicted geometries are then transformed as restraints to guide the structure prediction on the basis of direct energy minimization,which is implemented under the framework of Rosetta.The tr Rosetta server distinguishes itself from other similar structure prediction servers in terms of rapid and accurate de novo structure prediction.For illustrative purpose,tr Rosetta was applied to two Pfam families with unknown structures,for which the predicted de novo models were estimated to have high accuracy.Nevertheless,to take the advantage of homology modeling,homologous templates are used as additional inputs to the network automatically.In general,it takes ~1 h to predict the final structure for a typical protein with ~300amino acids.To enable large-scale structure modeling,a downloadable package of tr Rosetta with open-source code is available as well.Significant progress has been achieved in distance-based protein folding,due to improved prediction of inter-residue distance by deep learning.Many efforts are thus made to improve distance prediction in recent years.However,it remains unknown what is the best way of objectively assessing the accuracy of predicted distance.To this end,we propose a method called APD for inter-residue distance prediction assessment.The assessment can be done in three different flavors: prediction-oriented,native-oriented and full-list,depending on the set of residue pairs being assessed.A total of 19 metrics were proposed to measure the accuracy of predicted distance.These metrics were discussed and compared quantitatively on three benchmark datasets,with distance and structure models predicted by the tr Rosetta pipeline.Experiment results show that a few metrics,such as distance precision,have a high correlation with the model accuracy measure TM-score(Pearson’s correlation coefficient >0.7).In addition,these metrics are applied to rank the distance prediction groups in CASP14.It turns out that the ranking by our metrics coincides largely with the official version.These data suggest that the proposed metrics are effective for measuring distance prediction.IPro Align is a homology modeling method for RNA tertiary structure prediction.It first simplies the linear arc diagram of RNA secondary structure,and aligns all base pairs by integer programming.The alignment of secondary structure is then combined into the scoring function for global alignment.During the above process,a sequence profile calculated from the preprocessed multiple sequence alignment of the query sequence is utilized for scoring function design.A benchmark test on the TE80 dataset illustrates that after excluding templates sharing over 40% sequence identity with the query RNA,IPro Align generates better alignments than Foldalign,which has similar coverage as IPro Align,and the improvement is more significant on RNAs with pseudoknots.Results on the PUZ30 dataset show that the model quality of IPro Align is better than Loc ARNA,CARNA,RNAmount Align,and Foldalign.By comparing different strategies of using multiple sequence alignment,we prove that IPro Align utilizes the multiple sequence alignment in an efficient way.Besides,the analysis on the accuracy of secondary structure prediction,alignment coverage demonstrates that the model quality of IPro Align is associated with both factors to some extent.By comparing the running time of searching against a non-redundant template library,we show that IPro Align is faster than other methods,and its speed is less affected by the sequence length.All the above results suggest that IPro Align is a fast and efficient algorithm for detecting homologous templates. |