Font Size: a A A

Study Of Metal-ion Binding Sites For Disease-associated Proteins

Posted on:2024-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZouFull Text:PDF
GTID:2530307139485034Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The study of protein-ligand interaction has become one of the most important links in the study of protein function and mechanism of action,because the realization of many protein functions requires the binding of specific ligand,among which the binding of protein and metal ion ligand plays an important role in the formation of complex,structural stability,metabolic control and the generation of disease.To help researchers understand the molecular mechanism behind protein-metal ion interactions,the first step is to determine which residues in the protein bind to the metal ion ligand.The identification of metal ion ligand-binding residues(binding sites)by experimental methods is not only time-consuming and consumable,but also cannot meet large-scale applications,so theoretical prediction methods become particularly important.At present,there are two main research methods to study the binding of proteins and ligands:the method based on protein sequence information and the method based on protein structure information.However,most proteins do not have definite three-dimensional structure,so it is impossible to obtain their structural information.Therefore,it becomes very necessary to study the binding of proteins and metal ion ligands from the perspective of protein sequence information.In this thesis,we focused on three kinds of disease-related proteins,and introduced a prediction model on the binding sites between these proteins and metal ion ligands based on sequence feature combined machine learning algorithm.Firstly,the database was constructed based on the annotation information in the Uniprot database.Proteins related to three diseases(cardiovascular diseases,neurodegenerative diseases and cancer)were obtained.Then,the binding site information of proteins related to these three diseases and ligand molecules of Ca2+,Mg2+and Zn2+of three metal ions was obtained by using the Biolip database.Finally,sliding window was used to integrate the influence of binding sites and their adjacent residues,and random sampling was used to solve the problem of unbalanced positive and negative data sets.By variance analysis,it was found that there were significant differences in the distribution of amino acids at the binding sites and unbinding sites of the three metal ion ligands,and further extracted 7 sequence features:Location specific scoring matrix(PSSM),amino acid Component Information(AAC),dipeptide component(DC),polar amino acid component(P-AAC),polar amino acid dipeptide component(P-DC),hydrophilic amino acid component(H-AAC),hydrophilic amino acid dipeptide component(H-DC),and their combination characteristic parameters,Random forest algorithm and support vector machine algorithm were used to establish the classification model of three metal ion ligand binding sites.The results of the five-fold cross validation showed that:In the binary prediction to identify the binding sites and unbinding sites of three metal ions,after comparing the feature fusion and single feature,it was found that the total accuracy of prediction after feature fusion was improved to varying degrees,and for the Zn2+binding site,the highest accuracy(Acc)reached 87%in the feature fusion.The highest accuracy rate(Acc)of Ca2+binding site was 71%,and the highest accuracy rate(Acc)of Mg2+binding site was 70%.The most effective feature was the location specific matrix(PSSM).Finally,by using the above features,we further identified the binding sites of three metal ions(three-classification prediction),and the best accuracy(OA)of the prediction results reached 82%.It can be seen that our model has certain ability to recognize the binding sites of three metal ion ligands.
Keywords/Search Tags:Metal ion ligand, Five-fold cross test, Position-specific scoring matrix(PSSM), Analysis of variance, Random forest, Support vector machine
PDF Full Text Request
Related items