Font Size: a A A

Prediction Of Protein Thermodynamic Stability Changes Upon Mutations Based On Local And Global Representations

Posted on:2024-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:J T GongFull Text:PDF
GTID:1520307109981129Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The study of protein stability is of great significance in the fields of precision medicine and biomanufacturing.Protein plays a key role in organisms.Protein mutations affect protein stability,protein-protein interactions,and other functions,resulting in many diseases.In recent years,researchers have proposed several methods to design protein structure and function,as well as improve protein stability by simulating the folding process of proteins,which provides an important reference for modern drug development.Besides,protein-enzymes also play an important role in biological manufacturing processes such as food,detergent,paper,metallurgy and biofuels.People often need to find proteases with high stability,or rationally design as well as modify existing proteases to meet the needs of specific application scenarios.Furthermore,protein structural stability during natural evolution has a decisive influence on the discovery of proteins whose mutations lead to new functions.To sum up,how to improve the stability of proteins with biological significance and application value has always been a research hotspot in related fields.This paper focuses on the thermodynamic stability of proteins.The main objective is to predict the protein thermodynamic stability changes caused by single-point mutation based on the local and global representations of mutated proteins with machine learning methods.Firstly,to explore the representations related to thermodynamic stability,the differences in amino acid composition and evolutionary conservation of thermophilic/non-thermophilic proteins were analyzed,and a thermophilic protein classifier based on the protein language model was constructed to verify the validity of the global features of the sequence from the protein language model.Based on the finding that the increase of the proportion of specific amino acids in thermophilic proteins can improve the stability of proteins,a predictor is constructed for the thermodynamic stability change upon single-point mutation based on the local representation differences of amino acids before and after mutation.Based on this finding of embeddings from the protein language model can contribute to global sequence representation,a thermodynamic stability changes predictor was constructed based on global feature differences of proteins before and after mutation.According to the above research ideas,the author studies amino acids that are resistant to high temperatures and high stability and carries out the prediction of thermophilic/nonthermophilic protein classification.Sequence-based predictors were established in this paper that do not rely on crystal structure to predict protein stability changes caused by mutations.The author proposed a machine learning-based thermophilic protein classification predictor(PLM-Thermo),three protein stability changes prediction tools for different feature representation methods(MU3DSP,MU3DSP-S5296 and THPLM),and established a webserver based on the three methods.The main research work is as follows:(1)Aiming at the identification of thermophilic proteins with high-temperature resistance and high stability,the PLM-Thermo classifier combined the protein language model and the integrated learning algorithm LightGBM was proposed,achieving highly accurate identification of thermophilic proteins.The traditional thermophilic protein identification method is mainly to construct a predictor based on a machine learning algorithm through the characteristics of protein amino acid frequency,physical and chemical properties,and coevolution information based on multiple sequence alignment.However,such methods do not take into account the interactions in the protein structure,such as hydrogen bonds,salt bridges,and other factors that contribute to protein stability.In this regard,the author obtained the internal structure information of the protein based on the protein language model and then combined it with the LightGBM algorithm to build a thermophilic protein predictor.It is proved that the protein language model can better represent the sequence and improve the classification accuracy of thermophilic proteins.Next,after analyzing the differences in amino acid composition and evolutionary conservation between thermophilic and non-thermophilic proteins,the author finds that one is that the frequency of some amino acids in thermophilic proteins is related to protein stability,another is that some of the motifs are non-continuous on the sequence in the evolution process,which guides the prediction of protein thermodynamic stability changes caused by amino acid mutations.(2)Aiming at the prediction of protein stability changes caused by single-point mutations,around the local representation of mutant amino acids,this paper proposed a protein singlepoint mutation stability change prediction tool based on the 3D structure profile: MU3 DSP and MU3DSP-S5296.The two tools can successfully predict the stability changes caused by amino acid mutations only from the sequence and do not require the protein crystal structure.The author integrated the physicochemical properties of amino acids,the evolution information of proteins,and the structure-based features from the 3D structure profile,combined with the algorithm LightGBM,to construct MU3 DSP.To address the problem of model imbalance,this paper proposed a model MU3DSP-S5296 based on a balanced dataset.It is proved that MU3 DSP and MU3DSP-S5296 is the strong and stable models,and MU3DSP-S5296 is also a balanced model.It is proved that the 3D structural profile features based on mutant amino acids proposed in this paper is contributed to improving the performance of single-point mutation stability change predictors.(3)Aiming at the global sequence-based representation of mutant proteins,this paper proposed a deep learning framework based on a protein language model to predict the stability changes of single-point mutations: THPLM.Unlike MU3 DSP,THPLM is not limited to the local features of mutant amino acids but starts from the global sequence-based representation from the protein language model.In this paper,the sequence representation obtained by the protein natural language model is used to calculate the differential representation of variants and wild-type proteins.The differential representation is used to construct the single-point mutation predictor of protein thermodynamic stability changes.The THPLM model has shown good performance on multiple single-point mutations datasets and can be transferred to study the effect of multi-point mutations on protein stability changes,for example,the multi-point mutations on the thermophilic enzyme 3-isopropylmalate dehydrogenase achieved high prediction accuracy.
Keywords/Search Tags:Protein stability, Thermophilic protein, Single-point mutation, Protein representation, Protein language model, 3D structure profile
PDF Full Text Request
Related items