Font Size: a A A

Studies On The Relationship Between Extremophile Thermostability And Protein Sequence, Structure And Function By Bioinformatics Method

Posted on:2006-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y R DingFull Text:PDF
GTID:1100360155952453Subject:Fermentation engineering
Abstract/Summary:PDF Full Text Request
Thermophilic enzymes of extremophiles have many theoretical and applicationmerits in studying enzyme evolution, molecular mechanism of protein thermostabilityand the highest enzyme reaction temperature. After studying the influence of proteinsequence, structure and function on extremophile thermostability, people can not onlydescribe the physicochemistry principle hidden in protein fold and stability, but alsocan design novel thermophilic enzymes that can act in high temperature.In this paper, all prokaryotic complete proteome in NCBI COG database werestudied, and the factors that influence protein thermostability from protein sequence,structure and function were analyzed. All types of factors were determined, whichwould afford theoretical principle that could help people to improve the proteinthermostability in experimental biology.At first, all useful prokaryotic complete proteomes in NCBI COG database weresorted out using Perl programme and MySQL database, and protein relative databasewas constructed. The database comprised "prokaryotic protein sequence dataset","prokaryotic protein structure dataset" and "prokaryotic protein function dataset".Primary structure is very important for studying protein thermostability. Therelationship between dipeptide type and composition was analyzed. At the same time,the influences of dipeptide composition, residue number and single amino acidcomposition on protein thermostability were compared. The results indicated that thecomposition of Lys, Arg, VK, KI, YK, IK, KV, KY and EV in archaea and thecomposition of Lys, Glu, Tyr, Phe, Val, Ile, KE, EE, EK, YE, VK, KV, KK, LK, EI,EV, RK, EF, KY, VE, KI, KG, EY, FK, KF, FE, KR, VY, MK, WK and WE in bacteriaare positive significantly to protein thermostability, while the composition of Asp, Thr,Gln, His, DA, AD, TD, DD, DT, HD, DH, DR and DG in archaea and the compositionof Gln, Ala, His, Trp, Thr, Asp, WQ, AA, QA, MQ, AW, QW, QQ, RQ, QH, HQ, AD,AQ, WL, QL, HA and DA in bacteria are negative significantly to proteinthermostability. Characteric dipeptides studied not only showed the relationshipbetween dipeptide composition and protein thermostability, but also indicated theinfluence of dipeptide is larger than single amino acid composition.In order to prove the influence of amino acid composition and dipeptidecomposition on protein thermostability, The prediction accuracies of support vectormachines, Bayesian approach and K-Nearest Neighbors on predictinghyperthermophilic proteins, thermophilic proteins and mesophilic proteins werecompared, and support vector machines was selected as the most suitable machinelearning method to predict protein thermostability from amino acid composition. Then,the three types of proteins were predicted by support vector machines based on aminoacid composition, dipeptide composition and amino acid composition + dipeptidecomposition. The local prediction accuracies of them were 82.43%, 83.33%, 84.20%separately. The prediction accuracies could not only prove that primary structuredetermined the protein thermostability, but also could deduce that the influence ofdipeptide composition on protein thermostability is larger than single amino acidcomposition.Three dimension structures determine protein function and property. Then, theinfluence of secondary structure property, hydrogen bond, salt bridge, solventaccessible area, compactness, hydrophobicity, the number and volume of cavity andtemperature factor on protein thermostability in the "prokaryotic proteome structuredataset" were studied. The results indicated that all influence factors except salt bridgeare very different between archaeal and bacterial proteins. With proteinthermostability increasing, in archaeal proteins, total hydrogen bond number, mainchain-main chain hydrogen bond, uncharged-uncharged hydrogen bond and exposedresidue number decrease dramatically, while total salt bridge number and salt bridgenetworks increase dramatically. In bacterial proteins, uncharged-uncharged hydrogenbond decrease dramatically, while charged-charged hydrogen bond, total salt bridgenumber, salt bridge networks, all main chain temperature factor, side chaintemperature factor and whole chain temperature factor increase dramatically. So,archaeal proteins and bacterial proteins have different molecular machnism inwithstanding high temperature.The proteins with same function and different thermostability must have somerules in evolution history. With the aim of discerning archaeal protein's evolutionaryrelationships, thirteen aminoacyl-tRNA synthetases were firstly selected to prove thereliability of revelation from phylogenetic analyses of aminoacyl-tRNA synthetasemolecules that the universal tree of life consists of three domains: the archaea,bacteria and eucarya. Then the evolution relationship among 13 archaea throughphylogenetic analyses of aminoacyl-tRNA synthetases molecules was analyzed.According to evolution distances, four oldest organisms (Methanococcus jannaschii,Methanopyrus kandleri, Methanothermobacter thermautotrophicus, Methanosarcinaacetivorans C2A) were selected to discern proteins' evolutionary relationships withdifferent thermostability. The multiple sequence alignment results showed the chargedamino acid in thermophilic enzyme naturally mutate to hydrophobic amino acid inmesophilic enzyme gradually, which would reduce the chance that formed salt bridges.At the same time, "EIVINGMVFGEDGHKMSKSRGNV" was found as an aminoacid segment that include functional motif in thermophilic enzymes.One of the purposes of studying the factors that influence protein thermostabilityis to increase the thermostability of mesophilic enzyme. Then, the mesophilic lipase1CVL was mutated by homology modeling and its' thermostability was predicted bysupport vector machines separator. When many sites were mutated simultaneously,the molecular was predicted as hyperthermophilic protein sequence. Comparing thesecondary structure and quaternary structure of mutated molecular with mesophilicenzyme, we found the difference could be ignored. This showes mutated molecularhas the same function with mesophilic lipase 1CVL. Mutation has not changed theprotein function. The significant difference between them is that there are more saltbridges and salt bridges network in mutated molecular.
Keywords/Search Tags:NCBI COG database, archaeal protein, bacterial protein, thermostability, support vector machines, Bayesian approach, K-Nearest Neighbors, aminoacyl-tRNA synthetase, phylogenetic analysis, lipase, homology modeling
PDF Full Text Request
Related items