Prediction Of Protein-Ligand Binding Residues Using Sequence Information And Extreme Gradient Boosting

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Zhao

Full Text:PDF

GTID:2381330611471502

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

The interaction of proteins and ligand molecules achieves and regulates specific functions in life activities.The prediction of protein–ligand binding residue is important for understanding protein function,drug discovery and drug design,and is the common focus of life science and computer science.Protein–ligand binding residue prediction computational methods are inexpensive and fast compared with experimental methods.This paper proposes a new computational method,SXGBsite,which includes the synthetic minority over-sampling technique(SMOTE)and the Extreme Gradient Boosting(XGBoost).SXGBsite uses the position-specific scoring matrix-discrete cosine transform(PSSM-DCT)and predicted solvent accessibility(PSA)to extract features containing sequence information.A new balanced dataset was generated by SMOTE to improve classifier performance,and a prediction model was constructed using XGBoost.The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE.An evaluation using 12 different types of ligand binding residue independent test sets showed that SXGBsite achieves excellent five-fold cross-validation results on ten of the training sets,and performs similarly to the existing methods on eight of the independent test sets with a faster computation time,the difference between the area under the receiver operating characteristic curve(AUC)of SXGBsite and the best AUC was within 0.020.SXGBsite may be applied as a complement to biological experiments.

Keywords/Search Tags:

protein–ligand binding residue, synthetic minority over-sampling technique (SMOTE), Extreme Gradient Boosting (XGBoost), discrete cosine transform(DCT), discrete wavelet transform(DWT)

PDF Full Text Request

Related items

1	Analyzing Similarity Of Protein Sequences With Discrete Wavelet Transform
2	Study On The Methods Of~1H-NMR And Ftir Spectroscopy Combined With Chemometrics Application To Traditional Chinese Medicine Identification
3	Research On Enhancement And Recognize Of Mongolian Furniture Patterns Based On Singular Values And Gamma Functions In Frequency Domain
4	Study On The Methods Of Infrared Spectroscopy Combined With Chemometrics Application To Chinese Medicine Identification
5	Identification Research And Application For Protein Post-translational Modification Sites
6	Studies On Fundmental Issues Of Near-Infrared Spectroscopy: In-Line Analysis, Multi-Component Anlysis And Spatial Effect
7	Add New Feature Parameters To Identify Protein Metal Ion Ligand Binding Residues
8	The Application Of The Wavelet Transform In The Digital Watermarking Technique
9	Research On Fault Diagnosis Of Industrial Imbalanced Data Based On Manifold Learning
10	The Theory Of The Color Image Watermarking And Applied Research