| The interaction of proteins and ligand molecules achieves and regulates specific functions in life activities.The prediction of protein–ligand binding residue is important for understanding protein function,drug discovery and drug design,and is the common focus of life science and computer science.Protein–ligand binding residue prediction computational methods are inexpensive and fast compared with experimental methods.This paper proposes a new computational method,SXGBsite,which includes the synthetic minority over-sampling technique(SMOTE)and the Extreme Gradient Boosting(XGBoost).SXGBsite uses the position-specific scoring matrix-discrete cosine transform(PSSM-DCT)and predicted solvent accessibility(PSA)to extract features containing sequence information.A new balanced dataset was generated by SMOTE to improve classifier performance,and a prediction model was constructed using XGBoost.The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE.An evaluation using 12 different types of ligand binding residue independent test sets showed that SXGBsite achieves excellent five-fold cross-validation results on ten of the training sets,and performs similarly to the existing methods on eight of the independent test sets with a faster computation time,the difference between the area under the receiver operating characteristic curve(AUC)of SXGBsite and the best AUC was within 0.020.SXGBsite may be applied as a complement to biological experiments. |