Font Size: a A A

Statistical Modeling And Theoretical Analysis Of Biology-Related Systems

Posted on:2012-03-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:1110330371455359Subject:Chemistry
Abstract/Summary:PDF Full Text Request
With the comming of post-genome era, proteomics is becoming an important research domain in the life science, the aim of the biologists' studies gradually are transferred to explore the function of the whole biological system. Protein is one of the most important macromolecules in biological system, which carries out biological functions. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The function of protein is dependent on its three-dimensional structure and dynamic properties, most proteins exist in unique conformations exquisitely suited to their function. Therefore, in order to have a good grasp of the molecular mechanism of biologically activity and provide useful structure-based drug molecule design information, it is indeed extremely meaningful to illustrate the relationship between protein structure and function. There are two ways to accomplish the research:statistic modeling and theoretical calculation. Statistic simulation methodology could help us to extract the useful information from large quantities of data and used to explain/predict the related activities, while theoretical calculation method could provide accurately atomic-level analysis of biological systems, such as interaction between the donor and receptor, charge distribution and so on. In this dissertation, we use these two methods to study the the relationship between the structure and function of protein/peptide:(1) Quantitative structure-property relationships (QSPRs) have been developed to predict the ion mobility spectrometry (IMS) drift time tD for a set of 1481 peptides using information directly derived from molecular structures. The relationship between peptide structure and the drift time tD was constructed by using partial least squares regression (PLS), least-squares support vector machine (LSSVM) and Gaussian process (GP) coupled with genetic algorithm-variable selection. Analysis of constructed models uncovers that (ⅰ) Among these models, the linear PLS was incapable of capturing all dependences in this peptide system:nonlinear LSSVM and GP methods presented a good statistical performance on reproducing peptide mobility behavior. Moreover, since GP was able to handling both linear and nonlinear-hybrid relationship, it gave a stronger fitting ability and a better predictive power than the LSSVM:(ⅱ) The relationship between the structural features of peptide and ion drift times in IMS is, mainly, nonlinear, beyond that, the statistics suggest a linear relationship exsits, too; (ⅲ) Systematic analysis of the GA-GP model showed that diversified properties contribute remarkable effect to the relationship between the drift time and the peptide structure. Particularly, the structural topological information and charge distribution contribute significantly to the drift time of peptides.(2) Quantitative structure-property relationships (QSPRs) on the basis of constitutional, topological. geometrical, and electrostatic descriptors are developed for 2454 13CαNMR chemical shifts of 21 structure-known, high-quality monomeric proteins. In this procedure, heuristic approach is employed to perform variable-selection for obtaining few independent and significant descriptors. Coupled with various machine learning methods, including MLR. PLS, LSSVM, RF, and GP, these selected variables are then used to create both linear and nonlinear statistical models with the experimentally determined 13CαNMR chemical shifts of proteins. In addition, we also carried out quantum-chemical calculation of the 13CαNMR chemical shifts for 20 naturally amino acids to investigate the relationship between the 13CαNMR chemical shift and the geometrical structure of amino acid using QSPR approach. We demonstrate that (ⅰ) The MLR method could describe the relationship between the protein strcture and the 13CαNMR chemical shift very well with the r2= 0.800,q2= 0.795 and rpred2= 0.770:(ⅱ) Among these five models, the RF model give the best results with the r2=0.944,q2=0.830 and rped2= 0.824; (ⅲ) The nonlinear methods were generally better than the linear methods, but the linear MLR method can also achieve satisfactory results, which means that the impact of local environment in protein on the 13CαNMR chemical shifts depend mainly on linear relationship, and the nonlinear marginal effect can also influencing the 13CαNMR chemical shifts to some extent; (ⅳ) For an amino acid at helix state in protein, it was deshielded (downfield shift) more, while for an amino acid at strand state, there was a shielding of the 13Cαatom; (ⅴ) Once the conformations of proteins were established, the 13CαNMR chemical shift of a given amino acid residue in a protein was determined mainly by its own backbone and side-chain, however, the given amino acid residue environment in a protein was also a very important factor which can affect the 13CαNMR chemical shift.(3) A systematic theoretical investigation on the interaction energies of halogen-ionic bridges formed between halide ions and the polar H atoms bonded to N of protein moieties has been carried out by employing QM and hybrid-QM/MM methods. In this procedure, full geometry optimizations are performed at the Moller-Plesset second-order perturbation (MP2) level of theory in conjunction with the Dunning's augmented correlation-consistent basis set. aug-cc-pVDZ. Subsequently, two distinct basis sets. i.e. 6-311++G(df,pd) and aug-cc-pVTZ, are employed in the following single-point calculations so as to check the stability of the results obtained at the different levels of DFT. The results are imparted with the following remarks:(ⅰ) Most DFT methods perform well in determining△Eint of halide-binding complexes, among the tested DFT functionals, besides, the hybrid functionals generally yield deviations generally smaller than the corresponding pure ones; (ⅱ) the performance of the relatively small basis set,6-311++G(df,pd), is an appropriate choice that could precisely describe the△Eint of fluoride and chloride interacting with model protein moieties; (ⅲ) the HF, AMI, and PM3 methods tested in this work have a strong tendency to underestimate binding energies of all halide adducts, especially for fluoride-binding complexes; (ⅳ) the widely used function, B3LYP, seems not be the best functional for describing the△Eint of halide-moiety interactions; (ⅴ) the B98, B97-1, and M05 give the lowest RMSE for fluoride-binding energies, the best performances of chloride-binding energies are obtained with M05-2X. MPW1B95, and MPW1PW91, the best results of bromide-binding energies are determined by B97-1, PBEKCIS, and PBE1KCIS, meanwhile, B97-1. MPW1PW91. and TPSS give rise to the lowest RMSE for iodide-binding energies. In addition, the PBE1KCIS functional provides accuracies close to the computationally expensive MP2 method for the calculation of the△Eint of halide adducts.
Keywords/Search Tags:Statistic modeling, theoretical calculation, QSAR/QSPR, quantum chemical, QM/MM, protein, peptide, drift time, 13CαNMR chemical shift, halide-binding, halide motif, interaction energy
PDF Full Text Request
Related items