Font Size: a A A

Molecular Representation Based Approaches And Applications For Drug Discovery

Posted on:2022-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WangFull Text:PDF
GTID:1484306566964449Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Traditional drug discovery and development(R&D)paradigm is facing the challenge of long cycle times,high cost and low success rate.There was an urgent need of breakthrough technologies to increase the efficiency of drug discovery and reduce the drug-development costs.With the rapid accumulation of a vast amount of data in the post-genomic era and the development of big data technologies,data-driven cheminformatics methods added impetus to the biopharmaceutical R&D.The fundamental question in cheminformatics is how to obtain informative molecular representations.An effective molecular representation helps improve the efficiency of chemical space exploration.In this thesis,previous molecular representation methods were classified into three kinds,including feature engineering based methods,deep learning based methods and biological activity data based methods.This thesis is mainly concentrated on the development and application of new molecular representation for virtual screening and QSAR studies.Machine learning algorithms were implemented in combination with molecular simulation methods.The researches were carried out on two levels:one drug-one target and one drug-multiple targets.(1)In the first section,we evaluated the applicability of“Three-Dimensional Biologically Relevant Spectrum(BRS-3D)”for the identification of subtype-selective inhibitors.A case study was performed on monoamine oxidase,which has two subtypes related to distinct diseases.Among the 130 tested compounds,104 compounds demonstrated moderate inhibition(inhibition>50%)and 69 compounds had significant inhibition(inhibition>70%).Noteworthy,1 compound were identified as selective MAO-A inhibitors with IC50 less than 100 nM and 8 compounds were MAO-B inhibitors.Similarity search and synthesis were carried out based on these active compounds,which resulted in the discovery of 217 derivatives.The molecular basis for subtype selectivity was explored through docking,molecular dynamic simulation and attention based DNN model.Accordingly,it is found that BRS-3D is a robust method for subtype selectivity in the early stage of drug discovery and the compounds reported here can be promising leads for further experimental analysis.(2)The purpose of second section is to test the efficiency of different molecular representations in the framework of MTDNN model,including ECFP4,MACCS,MOE-2D,BRS-3D,SMILES Feature Matrix and hybrid representations.An example of case study was shown on a series of 181 representative GPCRs database.GPCR datasets were split into train/valid/test according to the molecular weight to test MTDNN performance under challenging conditions.Our results demonstrated that most MTDNN models have good predictive ability on the internal test set.Specifically,the ECFP4-MTDNN model had the best prediction performance on both the internal test set and the external test set.A combination of SMILES Feature Matrix and ECFP4 produced better results than the SMILES Feature Matrix used alone.Additionally,we compared MTDNN model with various single-task learning methods,including random forest(RF)and SVM.The results suggested that ECFP4-MTDNN model is overall superior to the state-of-the-art single task machine learning methods(RF and SVM)indicated by the high sensitivity and specificity.Finally,the effect of sample size on model performance was examined.The results suggested that the MTDNN model achieved better performance on datasets without sufficient ligand samples.(3)In the third section,a graph-based neural network architecture MPTransformer was proposed for molecular representation.Nine benchmark datasets collected from Molecule Net were used to perform an unbiased performance evaluation.MPTrasnformer was then applied in the virtual screening of LSD1 inhibitors.The assay results suggested that six compounds may be potential LSD1 inhibitors with micromolar affinities.The putative binding mode of these active compounds was analyzed by docking,which strengthened the importance of hydrophobic interaction between the binding site residues and the compounds.In addition,the efficiency of three different VS methods,including MPTransformer,RF model and docking was tested.Our experiments indicated that MPTransformer outperformed all the other VS methods concerning the enrichment factor and can be a valuable tool in virtual screening.Generally,this thesis explored the application of different molecular representations in drug screening and lead discovery via combining methods in cheminformatics and molecular simulation.
Keywords/Search Tags:Molecular representation, BRS-3D, Deep learning, Virtual screening, MAO, GPCR, LSD1, Subtype selectivity
PDF Full Text Request
Related items