| Drug discovery is a long and complex process.According to statistics,a new drug usually takes more than ten years with a cost of 1 billion dollars from development to market.With the rapid development of computer technology and the expansion of accessible database,computer aided drug design(CADD)plays an increasingly important role.Predicting the binding affinity based on the structures of the chemical molecule and the target to determine the lead compound is the core technology of CADD.Therefore,the accuracy of the scoring function embedded in the docking programs will affect the result of virtual screening and thus the success rate of later experimental validation.In recent years,with the widespread application of artificial intelligence(AI),applying machine learning(ML)and deep learning(DL)algorithms to design scoring functions has become a current research hotspot.Numerous studies have shown that ML/DL-based scoring functions are more accurate than traditional method for binding energy prediction.In this thesis,we proposed two scoring functions named OnionNet-2 and DeepRMSD+Vina for predicting protein-ligand interactions based on physical knowledge combined with the DL algorithm.Combined with other scoring functions,the methods we proposed can achieve higher prediction accuracy.The main research contents of this thesis are as follows:1.We designed a scoring function named OnionNet-2 based on convolutional neural network(CNN)for predicting the protein-ligand binding affinity,which describes protein-ligand interactions through the number of contacts of residue-atom pairs in different distance intervals.Taking CASF-2016 core set as the test set,the Pearson correlation coefficient(R)achieved by OnionNet-2 reached the local maximum value when the distance threshold between the residues and the atoms was increased to 1.55 nm;when the distance threshold was increased to 3.05 nm,the R value reached a global maximum of 0.864 with the root mean square error(RMSE)of 1.164.When taking CASF-2013 core set as the test set,OnionNet-2 achieved R of 0.821 and RMSE of 1.357,respectively.Our model outperforms almost all reported scoring functions on both datasets.Furthermore,OnionNet-2 also performed well on the CSAR NRC-HiQ dataset and the non-experimental structures,proving its strong generalization ability.2.We modeled protein-ligand interactions based on van der Waals and electrostatic potentials,and employed the CNN algorithm to design a scoring function named DeepRMSD for predicting the RMSD of ligand binding pose.This scoring function+aimed to select the binding poses that are close to the native conformation from the binding poses generated from docking programs.We also combined DeepRMSD with a traditional scoring function(AutoDock Vina)to get a new scoring function called DeepRMSD+Vina.After testing,DeepRMSD+Vina can achieve a higher success rate for docking.In the CASF-2016 docking power test,the Top 1 success rate achieved by DeepRMSD+Vina was 95.4%,which was significantly ahead of the second-ranked AutoDock Vina(90.2%).Based on the DeepRMSD+Vina scoring function,we proposed a ligand conformation optimization framework to improve the quality of binding poses generated by molecular docking.On the CASF-2016 docking poses,the optimization success rate of this optimization framework can reach more than 70%for docking poses with RMSD less than 3 ?.In two practical application scenarios(redocking and cross-docking tasks),DeepRMSD+Vina combined with this optimization framework can greatly improve the docking success rate.Therefore,our proposed scoring function and ligand optimization framework have high practical value.Through structure analysis,it is found that the optimized framework has the ability to restore hydrogen bonds.In this thesis,we systematically expound the physical concepts,modeling features and prediction accuracy of OnionNet-2 and DeepRMSD+Vina,which provide a fast and accurate solution for virtual screening based on ML algorithms. |