Font Size: a A A

Knowledge-based Composite Scoring Function For Protein Structure Prediction

Posted on:2020-08-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X WangFull Text:PDF
GTID:1360330599461838Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The three-dimensional(3D)structure of a protein is of crucial importance for the investigation of its function and related drug design.In the past decades,various schemes of predicting the three dimensional structures of proteins from amino acid sequences have been proposed.The prediction of protein 3D structures often contained two basic processes: conformational sampling and ranking of conformations.The accuracy of structural evaluation function,which is also known as scoring function,is important in discriminating near-native structures from decoys in the ranking process of a huge number of protein conformations.According to the theory of statistical mechanics,the stable state of a physical system corresponds to the one with the lowest free energy.Proteins,which are composed of a number of atoms,also follow this physical rule.Therefore,from the prospective of physics,an ideal scoring function should give the native structure the lowest free energy among all the putative conformations of a protein.Given the fast increasing number of experimentally determined protein structures in the Protein Data Bank(PDB),knowledge-based scoring methods have received extensive attentions in the past three decade,and achieved great successes in ab initio protein structure prediction and structure refinement.However,current scoring functions often only consider nonbonded interactions and neglect the effects of conformational entropies and bonded potentials like covalent bonds and angles for the sake of speed and simplicity.Although such scoring functions may be successful on fully relaxed conformations,they would have difficulties in ranking those decoys with distorted bonds or angles,especially when being used for conformational sampling in structure prediction.Therefore,such a scoring function may perform well on one or several decoy sets,but it often has a limited accuracy on large diverse sets.Addressing the limitations of current scoring functions,in this thesis we have done the following studies.1.During the development of knowledge-based scoring function,one critical step is to find an appropriate representation of protein structures.Despite significant progresses in simplifying residues into alphabets,few studies have been done to address the optimal number of atom types for proteins.Through a series of analysis based on the statistical mechanics-based iterative method,we found that 4 atom types may be used when investigating the basic folding mechanism of proteins,while 14 atom types were needed to describe the accurate protein interactions at atomic level.2.Although considerable progresses have been made in the calculation of potential energies in protein structure prediction,the computation for entropies of protein has lagged far behind.Instead of conducting computationally expensive Molecular Dynamics(MD)or Monte Carlo(MC)simulations,we obtained the entropies of protein structures based on the normalized probability distributions of backbone dihedral angles observed in the native structures.Our knowledge-based scoring function with inclusion of the backbone entropies,which is referred as ITScoreDA or ITDA(ITerative Score of Dihedral angle and Atom pair scoring function)was extensively evaluated on 16 commonly used decoy sets.It was shown that ITDA is significantly superior to the other 50 tested scoring functions in selecting native structures from decoys.This study suggests the role of backbone conformational entropies in protein structures and provides a way for fast estimation of the entropic effect.3.Current scoring functions often only consider nonbonded interactions and neglect bonded potentials like covalent bonds and angles for the sake of speed and simplicity.Although such scoring functions may be successful on fully relaxed conformations,they would have difficulties in ranking those decoys with distorted bonds or angles,especially when being used for conformational sampling in structure prediction.Addressing the limitation,we have developed a composite knowledge-based scoring function based on ITDA,named as ITCPS(ITerative ComPosite Scoring function),by integrating bonded and nonbonded potentials as well as orientation-dependent and hydrophobic interactions.ITCPS was extensively evaluated on 18 decoy sets of 927 proteins and compared with 51 other scoring functions.It was shown that overall ITCPS performed the best among the 52 scoring functions and achieved a good performance on all the test sets.Of 927 proteins,ITCPS recognized the native structures for 842 proteins,giving a success rate of 90.8% and an average Z-score of 3.36.Moreover,ITCPS also exhibited a strong ability to distinguish the best near-native structure among decoys and achieved a significantly better performance than other tested scoring functions.In summary,knowledge-based scoring function for protein structure prediction has been extensively studied in this paper.We first investigated the atom typing problem of proteins.Then,we developed an approach for calculating the backbone entropies of protein structures,named ITDA.Finally,we proposed a composite scoring function,ITCPS,which consists of the bonded and nonbonded potentials as well as orientation-dependent and hydrophobic interactions.Our model is also expected to be beneficial for the development of scoring functions for other interactions.
Keywords/Search Tags:protein structure prediction, scoring function, statistical potentials, knowledgebased, protein folding, structure prediction
PDF Full Text Request
Related items