Font Size: a A A

Recognition Of Ten Metal Ion-binding Residues Based On Sequence Information

Posted on:2018-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y CaoFull Text:PDF
GTID:2348330536979438Subject:Physical Electronics
Abstract/Summary:PDF Full Text Request
As we all know,metal ions play an important role in the biological life process.More than one-third of the protein structure contains metal ions which involved in enzyme catalysis,maintenance of protein structure and regulatory role.These functions are achieved by the interaction of proteins with metal ion ligands.Therefore,it is particularly important to identify the metal ion-binding residues in the protein,which has a guiding value for the design of molecular drugs.Based on sequence information,metal ions binding residues were recognized in proteins,the main works are as follows:(1)Ten metal ions(Zn2+,Cu2+,Fe2+,Fe3+,Ca2+,Mg2+,Mn2+,Na+,K+ and Co2+)binding residue data sets we used are originated are from the BioLiP database,which has a resolution better than 3?,sequence length > 50 residues and sequence identity below 30%.Based on the sliding window method,the optimal window for seven metal ions is 7,13,9,9,9,9,7,9,11,11 respectively.(2)The statistical analysis of ten metal ion binding residue datasets was carried out.It was found that the positions of amino acids were highly conserved.Ten kinds of metal ion binding residues were identified by Position Weight Scoring Function algorithm with the position conservation of amino acid information,the overall prediction accuracy is higher than 62.7%,and the MCC is high than 0.335 by 5-fold cross-validation.The results show that five kinds of metal ions(Zn2+,Cu2+,Fe2+,Fe3+ and Co2+)fit well,and the other five kinds metals ions(Ca2+,Mg2+,Mn2+,Na+ and K+)are slightly different with the ideal case.(3)In order to improve the recognition results,we studied the biological background of binding residues and extracted the amino acid component information,hydrophilicity and hydrophobicity information,polarization charge information,secondary structure information and solvent accessibility information to parameters.In order to avoid the overfitting of Support Vector Machine algorithm,some features were reduced dimensionally by Position Weight Scoring Function and increment of diversity algorithm,and the Support Vector Machine was used to identify all these metal ion binding residues.The overall prediction accuracy is higher than 74.8%,MCC is high than 0.502 by 5-fold cross-validation.The sensitivity of various metal ion ligand binding residues to the characteristic parameters was analyzed through Support Vector Machine with the combination of characteristic parameters.In order to enhance the practicability of the model,the metal ion dataset was tested independently,and compared with the results of previous studies,we found that the forecast trend is consistent with the previous results.(4)The Random Forest algorithm is introduced to this study.The same combination parameters as SVM algorithm are input into the random forest algorithm.The recognition result by the 5-fold cross-validation is slightly lower than the SVM algorithm.In the random forest algorithm,the recognition result under the five-cross test is slightly better than the r the SVM algorithm,especially for the Ca2+,Mg2+,Mn2+ ion ligands.(5)We set up a prediction platform,which is a online forecasting service website.It establishes the metal ion ligand binding residues on the line for the forecasting service,and it is free to all people to facilitate the relevant research.
Keywords/Search Tags:Metal ion ligand, Binding residue, Sequence information, Predicted structure information, Position Weight Scoring Function, Support Vector Machine, Random Forest
PDF Full Text Request
Related items