Font Size: a A A

Study On Approaches Of Protein Structure Prediction

Posted on:2010-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ChangFull Text:PDF
GTID:1100360275451140Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the completion of Human Genome Project (HGP), the life science has focused on the study of the gene expressed products, i.e. proteins, and proteomics researches have become the pioneering and hot themes of post-genomic era. The core of the post-genomic era research has gone into the interactions between protein and ligand and the relationship between protein structure and function. The studies of the interactions and recognition between protein receptor and ligand are very important for understanding the molecular biology mechanism of proteins in cell, the computer-aided drug design and the structure prediction of protein-protein complex.It is difficult to determine a protein complex structure through experimental methods. Recently, with the continuous progress of the computers'processing ability, as well as the rapid development and extensive application of theoretical simulation, molecular modeling methods have become important tools for exploring the interaction mechanism of protein receptor with ligand. In this thesis, a series of studies on the mechanism of interactions and recognition of protein have been done by using the molecular docking and amino acid network methods. The content of the thesis contain the following major aspects:(1) Study on the scoring function of protein molecular dockingTwo scoring functions were proposed. One was a combinatorial scoring function, ComScore, which was specially designed for the Others-type protein-protein complexes. ComScore was composed of the atomic contact energy, van der Waals and electrostatic interaction energies, in which the weight of each item was fit through the multiple linear regression approach. The test result on 17 Others-type complexes from CAPRI benchmark 1.0 demonstrated that the combinatorial scoring function can delineate the interaction feature of the Others-type complexes, reflect the energy change during the complex formation, and have a certain capacity to discriminate effective structures from numbers of the docked decoys. ComScore was used in the scoring test for CAPRI rounds 9-12 with the good results in rounds 9 and 11.The other one was a scoring function based on the amino acid network. The topology study on protein-protein complexes will give some insights into the mechanisms of protein-protein interactions. In this thesis, residue networks were constructed by defining the amino acid residues as the vertices and atom contacts between them as the edges. The residue network of a protein complex was divided into two types of networks, i.e., the hydrophobic and the hydrophilic residue networks. Analyzing these two different types of networks, we find that these networks are of small-world properties. Furthermore, through analyzing the network parameters, it is found that the correct binding complex conformations are of both higher sum of the interface degree values and lower characteristic path length than those of the incorrect ones. These features reflect that the correct binding complex conformations have better geometric and residue type complementarity, and the correct binding modes are very important for preserving the characteristic path lengths of the native protein complexes. In addition, two scoring terms are proposed based on the network parameters, in which the characteristics of the entire complex shape and residue type complementarity are taken into account. These network-based scoring terms have also been used in conjunction with other scoring terms, and the new multi-term scoring HPNCscore has been devised. It can improve the discrimination of the combined scoring function of RosettaDock more than 12 %. This work can enhance our knowledge of the mechanisms of protein-protein interactions and recognition and also be used in protein design.(2) Improvement on the searching algorithm of protein molecular dockingIt is required for molecular docking to search structures with lower energy in less time. Therefore, another important issue of molecular docking is how to have the efficient searching algorithms. In other words, the efficiency of docking programs will be improved by using new theories and computational methods. AutoDock 3.0 is a widely used docking program developed by the Professor Olson's group at the Scripps Research Institute, which has achieved great success in the prediction of the binding modes and conformations between protein receptor and ligand. In this thesis, based on the analysis of the algorithm of AutoDock 3.0, we proposed 5 parallel methods using the message passing interface (MPI) library. We tested and analyzed these methods for reliability, parameter analysis (including 5 input parameters) and the influences of the numbers of processors. In the reliability test, the parallel scheme 5 and the serial program were applied to 10 protein-ligand systems and the docking results indicate the validity and reliability of the parallel programs. In the parameter analysis, we changed 5 different parameters, including the numbers of energy evaluations, the population sizes, the frequencies of local search, the iteration numbers of local search and the numbers of docking runs. The influences of those parameters on the different 5 schemes were analyzed, which will guide for the parallel program in the virtual screening. In the test of the numbers of processors, the hybrid parallel scheme 5 has the characteristic of other schemes. With the processor increasing, the hybrid scheme can effectively use the processors and keep higher speedup and parallel efficiency. The parallel improvement can enhance the efficiency of the molecular docking program AutoDock 3.0, which can give some help to the computer-aided drug design and virtual screening.In addition, instead of the global search of Genetic Algorithm (GA), we improved AutoDock 3.0 by using the Ant Colony Optimization (ACO) method. Tested on the 22 protein-ligand systems, it is found that the ACO method can make an improvement for the searching results. Meanwhile, it is found that whether with the local search or not, the performance of ACO is obviously better than that of GA and the energy convergence rate of ACO is also quicker than that of GA. The new search technology, the ACO method, has been introduced and it will give some advices for the improvement of docking programs.(3) Study on the amino acid networks of proteinsThe three-dimensional structure of a protein can be treated as a complex network composed of amino acids, and the network properties can help us to understand the relationship between structure and function of protein. Since the amino acid network of a protein is formed in the process of protein folding, it is difficult for the general network models to explain its evolving mechanism. Based on the perspectives of protein folding, we proposed an evolving model for the amino acid networks. In our model, the evolution starts from the amino acid sequence of a native protein and it is guided by two generic assumptions, i.e. the neighbor preferential rule and the energy preferential rule. It is found that the neighbor preferential rule predominates the general network properties and the energy preferential rule predominates the specific biological structure characteristics. Applied on native proteins, our model can mimic the features of the amino acid networks well.In addition, the conservation residue network was constructed and studied. Identifying protein interface is crucial for the prediction of protein-protein interactions and for protein functional classification. In this thesis, the protein structure was modeled as an undirected graph with the conservation amino-acid residues as the vertices and atom contacts between them as the edges. It is found that the conservation residue networks are characterized by intermediate values of clustering coefficient and characteristic path length, which are the typical property of the small-world networks. The residues on the protein interfaces typically have higher degree values and lower clustering coefficient values than that of the surface residues. Additionally, it is detected that the spatial clustering of the conservation residues is a general phenomenon. These results indicate that the conservation residue network propensities can give us some new parameters in protein–protein interface prediction.
Keywords/Search Tags:protein molecular docking, scoring function, searching algorithm, amino acid network, evolving network model
PDF Full Text Request
Related items