Prediction Of Protein-Ligand Interactions Through Artificial Intelligence-and Physics-Based Approaches

Posted on:2024-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:S K Gu

Full Text:PDF

GTID:2544307163477784

Subject:Pharmaceutical

Abstract/Summary:

PDF Full Text Request

Computer-aided drug design(CADD)plays an increasingly important role in the discovery of lead compounds in the early stage of drug development.Accurate description and prediction of protein-ligand interactions is a key scientific issue in the study of CADD,which is directly related to the discovery of active molecules based on specific targets and the evaluation of selective/off-target effects of active molecules,and determines the discovery efficiency of high-quality lead compound molecules to a certain extent.It is difficult for traditional physics-based protein-ligand interaction strength calculation methods to obtain satisfactory results in both predictive accuracy and efficiency.In recent years,with the revival of artificial intelligence and the advent of the era of big data,the prediction of protein-ligand interaction using statistics-based machine learning(ML)method has attracted much attention.In multiple application scenarios of protein-ligand interaction prediction,statistics-based machine learning has shown great potential in efficiency,accuracy and other aspects of performance,but there is still room for improvement: In the actual scenario of drug design,accurate structure of protein-ligand complex is often lacking.In the current protein-ligand affinity prediction based on machine learning method,protein flexibility is rarely considered,which fails to realize effective complementarity with physics-based methods.The prediction accuracy of protein-ligand interaction for protein families with similar pocket structure is limited,which makes it difficult to find selective/specific active molecules and evaluate off-target effects of important target families,such as epigenetic targets.In this paper,we systematically evaluated the influence of protein dynamic structure information on protein-ligand affinity prediction,and developed an epigenetic target inhibition profile prediction model AMGC,based on multi-task learning and contrast learning.In the second chapter of this paper,we systematically explore whether introducing structural dynamic information could improve the protein-ligand affinity prediction accuracy based on ML models.Most of the current ML scoring functions mainly start from a single static structure(docking structure or crystal structure),which ignore the dynamic protein-ligand binding process and its structural flexibility.In practical drug design scenarios,the crystal structure of protein-ligand complexes is often lacking,and the accuracy of docking structures is limited.Traditional molecular dynamics(MD)simulations can consider the protein flexibility to some extent by time-dependent sampling of protein-ligand complexes.However,it is still unclear whether/how MD simulation sampling could improve the predictive performance of ML scoring function.In this chapter,we select three targets with different structural flexibility(JAK1,TAF1-BD2 and DDR1)and their active molecules to construct research systems.Three classical ML algorithms(RF,SVR and XGB)are adopted.The effect of structural dynamics information obtained from MD trajectories on the affinity prediction was evaluated by comparing the performance of ML models trained by different dimensional descriptors,and the ML model performance of affinity prediction was also compared with that of two common virtual screening methods(Glide,MM/PBSA).The results show that,when the docking structure is used as the initial structure for MD,the fusion of the structural dynamic information has different effects on targets with different flexibility.For the TAF1-BD2 target with relatively maximum flexibility,the accuracy of the affinity prediction could be improved.For the more stable JAK1 target,the addition of these dynamic structure information has little effect on the model performance.For DDR1 with relatively large differences in ligand-binding induced conformation,the inclusion of insufficiently sampled information may even impair the model performance.In addition,the results of conformation analysis of the three target systems highlight the importance of MD initial structure for the final predictive result of the models.In the third chapter,we develop a deep learning model AMGC based on multi-task learning and contrast learning to predict the inhibition profiles of 67 epigenetic targets of small molecule compounds for selectivity/specificity evaluation of active molecules.Epigenetic targets have attracted much attention in the process of drug development.In recent years,the rapid accumulation of biological data in the field of epigenetic inheritance has brought opportunities for the inhibition profiles prediction of active molecular epigenetic targets based on deep learning.In both internal and external test sets,AMGC developed by the author has achieved better results than classical machine learning models based on molecular fingerprints and deep learning models based on other graph neural networks.In addition,AMGC is superior to the previous SOTA model ETP in terms of both the actual performance of the model and the algorithmic mechanism.In the training process of the model,adaptive learning and comparative learning strategies are introduced to help the model explore a larger hyperparameter space with certain computational resources,and prevent the model from falling into the local minimum point in the training process to a certain extent.Four representative compounds were further selected for model interpretability analysis.The results show that AMGC can learn the atoms that interact strongly with residues around the binding pocket according to the weight coefficients assigned by the model,and capture the small structural changes of the compounds.Finally,we developed a user-friendly website for researchers to epigenetic targets fishing(http://cadd.zju.edu.cn/amgc).

Keywords/Search Tags:

Molecular Dynamics Simulation, Scoring Function, Machine Learning, Structural Flexibility, Multitasking Learning, Contrastive Learning, Interpretability, Interaction Prediction

PDF Full Text Request

Related items

1	Methodological Study On Machine Learning-Based Scoring Functions For Protein-Ligand Docking
2	Establishment And Application Of CB2 Ligand Prediction Model Based On Machine Learning
3	Bioinformatics Analysis And Molecular Dynamics Simulation Of Structural Features Of Regulatory SNP
4	Research On Prognosis Prediction Model And Its Interpretation Of Stroke Based On Online Learning
5	Signaling Adverse Drug Reactions Via Machine Learning Methods
6	Research On Liver Cancer Prediction Model Based On Interpretable Machine Learning
7	A Preliminary Study On The Construction Of Interpretable Prostate Biopsy Prediction Model Based On SHAP And Machine Learning
8	Research On Drug Interaction Prediction Method Based On Knowledge Graph Interpretability
9	Research On Conformation Consistency Of Molecular Docking Based On Machine Learning And Prediction Of Interaction Mode Of MHTT/Molecular Glue/LC3 Ternary Complex
10	Study On AI-based Scoring Functions And Development Of Computational Platform