Font Size: a A A

Database Construction For Structural Genomics And Computational Analyses On Protein Backbone Conformations

Posted on:2009-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:C XuFull Text:PDF
GTID:1100360242995931Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Protein, as one of the most two important bio-molecules, is the carrier of almost all the activities of organisms including catalyzing biochemical reactions, oxygen transportation of hemoglobin in the respiratory, antibodies in immune systems, and so on. Researches of structural and functional informations illuminate the life phenomena on the molecular level. Most native proteins under the physiological conditions fold into stable three-dimensional structures which determine their biological functions. The diversity of the protein functions comes from the varity of the protein structures. Although researches on the protein structures got more and more attention because they are the foundation of the function research and protein design, the experiment of protein structural determination is quite time- and money-consuming and of limited success rate. To resolve this conflict, on the one hand several structural genomivs projects were carried out allover the world. The speed is accelerated and the cost is reduced for the determination of the protein structure by the high-throughput. On the other hand computer simulation was used to predict the protein structure at lower cost. Although great improvements were achieved, there are still many works to do to make a complete solution for protein structure predicting.There are mainly two works described in this paper, one is the target selecting/filtering and the database constructed for the structural genomics consortium of CAS, the other is enchanced sampling methods for molecular dynamics and the application on the side-chain effects and neighbor effects studying. It can be summaried as the following three parts:In the first part of Chapter 1, we gave a brief introduction to protein structure and commonly used structure determinating methods including experimental determination (X-ray diffraction and nuclear magnetic resonance spectroscopy) and theoretical prediction methods. Theoretical prediction methdso can be divided into physical and knowledge-based ones which can further classified into homology modeling and fold recogonition according to the sequence similarity to the template with known structure. Compared to the experimental protein structure determinating methods, theoretical prediction is fast, lower-cost, high-throughput and is the an effective supplementary to the experiments. In the second part of this chapter, structural genomics was introduced including its outline, advantages, disadvantages and the current status of this field. Structural genomics consists the high-throughput determination of the three dimensional structure of all proteins of whole genome. Large amount of experimental data will be produced and must be stored, managed, shared in the proper way so that data mining can be done on these data. In the meanwhile, the targets of the structural genomics must be selected carefully due to the expensive cost of the structural determinating experiments. And information about these targets is the more the better. In the final part of this chapter, we presented a relatively detailed introduction to molecular dynamics about its history and developing status. Molecular dynamics is one kind of computational simulation methods based on physical models, it can produce three dimensional structure of proteins as well as much detailed dynamical information of bio-molecules and reactions that can't be obtained from experiments. The theoretical basis of molecular dynamics is numerical solving the Newton's equation or Schrodinger equation of the movements of the atoms. The semi-empirical potential functions used in most simulations have the same formation and only minor deviations on parameters. These functions often include items for bonding (bond stretching, bond angle bending, dihedral angle bending and rotating) and non-bonding interactions (electrostatic and van der Waals interactions). This formation of the semi-empirical potential simplified the calculation but also limited the accuracy. Much efforts were made to improve the accuracy such as taking polarization effects into account, correlation of the backbone dihedral angles and so on. Another limition to the application of molecular dynamics is the simulation time. Due to the high frequency of the atomic vibration, small time step is required. But under Boltzmann distribution of traditional molecular dynamics, ergodicity needs very long time of simulation. Large time step methods through the elimination of the atomic vibrations were both developed to enable long time of simulation. Another way to extend the simulation time is to reduce the calculation time of one integral step which refers to implicit solvent and many enhanced sampling methods. Implicit. solvent models were developed to estimate the effects of the salvation without taking too much degree of freedoms of the solvent into account. Many enhanced sampling methods developed in the last 20 years were introduced including: high-temperature molecular dynamics, amplified collective motions, conformational flooding, accelerated molecular dynamics, umbrella sampling, Tsallis effective potential, temperatue/Hamiltonian replica exchange methods, and so on.In Chapter 2, we introduced a beforehand work of the target selecting/filtering and the construction of an efficient database-based web system for target protein selection, data management and target annotation for the high-throughput structural genomics consortium of Chinese Academy of Sciences. The first part of this chapter describes the whole database system we constructed: the system frame, the requirements of the system and the functions of the system. It's a web-interactive database system. In a user-interactive way, it allows the distributed participating groups append, edit and share experimental data and parameters as well as information predicted by bioinformatics methods. The second part of this chapter describes the structural and functional annotations of the target proteins. Through various bioinformatics sources, we collected and integrated all sorts of structural and functional annotations of the proteins including general annotation, physical features, secondary structures, conserved domains and functional motifs, and possible functions etc. The final part of this chapter describes the target selecting/filtering procedure. Considering both the experimental feasibility and the protential biological importance, we figured out a proper flowchart of target filtering and prioritization. Using the filtering procedure, we got 1823 target proteins with priority.In Chapter 3, we presented a Hamiltonian replica exchange approach and applied it to investigate the effects of various factors on the conformational equilibrium of peptide backbone. In different replicas, biasing potentials of varying strengths are applied to all backbone (φ,φ) torsional angle pairs to overcome sampling barriers. A general form of constructing biasing potentials based on a reference free energy surface is employed to minimize sampling in physically irrelevant parts of the conformational space. An extension of the weighted histogram analysis formulation allows for conformational free energy surfaces to be computed using all replicas, including those with biased Hamiltonians. This approach can significantly reduce the statistical uncertainties in computed free energies. For the peptide systems considered, it allows for effects of the order of 0.5-1 kJ/mol to be quantified using explicit solvent simulations. We applied this approach to capped peptides of 2 to 5 peptide units containing Ala, Phe or Val in explicit water solvent, and focused on how the conformational equilibrium of a single pair of backbone angles are influenced by changing the residue types of the same and neighboring residues as well as conformations of neighboring residues. For the effects of changing side chain types of the same residue, our results consistently showed increased preference ofβfor Phe and Val relative to Ala. As for neighbor effects, our results not only indicated that they can be as large as the effects of changing the side chain type of the same residue, but also led to several new insights. We found that for the N-terminal neighbors, their conformations seem to have large effects. Relative to theβconformer of an N-terminal neighbor, itsαconformer stabilizes theβconformer of its next Ala disregarding the residue type of the neighbor. For C-terminal neighbors, their chemical identities seem to play more important roles. Val as the C-terminal neighbor significantly increases the PII propensity of its previous Ala disregarding its own conformational state. These results are in good accordance with reported statistics of protein coil structure libraries, proving the persistent presence of such effects in short peptides as well as in proteins. We also observed other side chain identity and neighbor effects which have been consistently reproduced in our simulations of different small peptide systems but not displayed by coil library statistics.
Keywords/Search Tags:structural genomics, database, target selection, structural and functional annotation, Hamiltonian replica exchange, backbone conformation, conformational free energy, side chain effects, neighbor effects
PDF Full Text Request
Related items