Font Size: a A A

Identification Of Metal Ion Ligand Binding Residues In Proteins Based On Gbm Algorithm

Posted on:2020-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:2370330590459751Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Protein plays a critical role in the life process,and different special functions are realized in different life process.However,the realization of many protein functions requires binding with specific ligands,and more than one-third of the proteins need to bind with metal ion ligands,so metal ion ligands play an important role for the realization of protein function,and it is helpful to correctly identify metal ion ligand binding residues for the human health and the design of molecular drug.Well,due to the time consuming,high cost and other limitations of experimental methods,and the data cannot be processed in batches.Therefore,it is particularly important to accurately identify the binding residues of metal ion ligands by computational methods.In addition,only a small fraction of the sequenced proteins have their 3D structures.Therefore,based on the sequence information of proteins,the statistical analysis and prediction for the binding residues of metal ion ligands are performed.The main works are as follows:?1?The ten metal ions ligands(Zn2+,Cu2+,Fe2+,Fe3+,Co2+,Ca2+,Mg2+,Mn2+,Na+and K+)binding residues were studied.The amino acid residue,the hydrophilicity and hydrophobicity,the polarization charge,the predicted secondary structure and relative solvent accessibility information were selected as the feature parameters based on the previous studiesand the biological background.By statistical analysis of relative solvent accessibility,it was reclassified,then we obtained four different classifications?SA2,SAV,SAP,SA4?.?2?Based on the position conservative information of amino acid,hydrophilic-hydrophobic,polarization charge,secondary structure and relative solvent accessibility,the 2L-dimensional features parameters were obtained respectively by using the position weight matrix.Based on the Gradient Boosting Machine,the 5*2L dimensional feature parameters corresponding to the four different classifications were input into the algorithm to predict 10 metal ion ligand binding residues.According to the optimal prediction results,we obtained the optimal classifications of relative solvent accessibility for ten metal ion ligands.The prediction results by using 5-fold cross-validation were better than the previous ones,the MCC values were higher than0.558 and the Acc values were higher than 77.9%.We had obtained similar prediction results based on subset features,the prediction results indicated that our model had a good stability.We verified the practicability of the proposed method by using independent test.In addition,the experimental results proved that the prediction model had better identification ability for predicting metal ion ligand binding residues.?3?The composition information and position information of the feature parameters were reduced the dimension by using the increment of diversity algorithm and Position Weight Scoring Matrix algorithm,we obtained 20 dimensional combination information.The Gradient Boosting Machine based on the optimization settings,the combination information was input into the algorithm for optimization calculation.The optimal algorithm parameters and prediction results corresponding to the 10 metal ion ligands were determined respectively.At the same time,based on the 5*2L dimensional feature parameters and the GBM of optimization settings,the prediction results of 10 metal ion ligands binding residues were calculated.The prediction results further indicated that the optimization of the algorithm parameters in GBM was important.
Keywords/Search Tags:Metal ion ligand, Binding residues, Gradient Boosting Machine algorithm, Relative solvent accessibility, Combination Information, Optimal Algorithm Parameters
PDF Full Text Request
Related items