Font Size: a A A

A Deep Learning Framework For Improving Long-range Residue-residue Contact Prediction Using A Hierarchical Strategy

Posted on:2018-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:D P XiongFull Text:PDF
GTID:1360330566988081Subject:Biology
Abstract/Summary:PDF Full Text Request
The suitable residue-residue contacts are known to play key roles in maintaining the native conformation of proteins and guiding protein folding.Protein residue-residue contact prediction is of great value for protein structure prediction,since contact information,especially from those long-range residue pairs,can be used to directly guild the re-construction of protein three-dimensional structures,to reduce the space of conformational sampling by improving the minimum of the landscape funnel of the overall energy function,and to improve the model evaluation and selection by constructing a scoring function.Additionally,its application has been expanded to rational drug design.In recent years,although the protein residue-residue contact prediction has been studied in depth,and the international CASP competition has also greatly contributed to the development of this field,the prediction accuracy is still far from satisfactory,therefore,it also was paid more and more attention.Currently,the methods for protein residue contact prediction can be categorized as template-based and sequence-based.The former makes prediction based on homologous templates and is thus limited in usefulness.Conversely,the latter that only requires the amino acid sequence for prediction has been investigated more enthusiastically,because of the more research value.The sequence-based methods mainly include the machine-learning-based methods and the coevolution-based methods.The former were developed to train various machine learning models by retrieving statistical information from the structural database,and the latter mainly employed the coevoluationary information derived from the multiple sequence alignment of non-redundancy homologous sequences of proteins.Previous studies have shown that these two kinds of sequence-based methods could be combined to further improve protein residue-residue prediction.In this work,we developed a package DeepConPred,which includes a pipeline of two deep-learning-based models(DeepCCon and DeepRCon)as well as a refinement step,to effectively combine statistical information retrieved from the structure database and coevolutionary information extracted from the sequence database for long-range residue-residue contact prediction.DeepConPred used a hierarchical architecture,the coarse contacts between secondary structure elements predicted by DeepCCon in the first stage can facilitate the prediction of long-range residue-residue contacts by DeepRCon in the second stage.Here,for both DeepCCon and DeepRCon,we proposed a number of novel features and incorporated them with good known features for more comprehensive description of protein structural properties.Furthermore,for effectively improving the coarse contact prediction and the long-range residue-residue contact prediction,we employed the feature selection technique to select the most discriminant feature subsets,and the deep learning technique to construct the models.Specifically,DeepCCon and DeepRCon were trained using coevolutionary information derived from a reduced number of non-redundancy homologous sequences to ensure the robustness for small-family proteins,while the subsequent refinement step was designed to integrate full coevolutionary information derived from all non-redundancy homologous sequences to improve the prediction of large-family proteins.Extensive experiments suggest that DeepConPred can effectively improve the long-range residue-residue contact prediction,and can be regarded as a very competitive method.
Keywords/Search Tags:coarse contact prediction, long-range residue-residue contact prtediction, deep learning, feature selection
PDF Full Text Request
Related items