Font Size: a A A

Computer Aided Bioacitivity Prediction Of Epidermal Growth Factor Receptor Inhibitors

Posted on:2024-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:D H HuoFull Text:PDF
GTID:1524307091964699Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
Since 2020,the incidence and mortality rates of cancer worldwide have been continuously increasing.According to statistical data,the annual number of newly diagnosed cancer cases has exceeded 19 million,with over 9.9 million deaths.In particular,cancer types such as breast cancer,lung cancer,and colorectal cancer account for the vast majority of incidence and mortality cases.The epidermal growth factor receptor(EGFR)plays a crucial role in the processes of tumor cell proliferation,angiogenesis,tumor invasion,metastasis,and apoptosis.Aberrant expression of the EGFR protein is a critical signal in the development of cancer,making EGFR an important target for the design of anticancer drugs.However,resistance caused by EGFR mutations has been a significant issue in developing small molecule inhibitors targeting EGFR.Consequently,research on innovative drugs targeting EGFR holds immense potential,offering promising solutions for cancer treatment.This thesis focuses on EGFR as the primary research subject and investigates the structure-activity relationships of EGFR inhibitors using cheminformatics and artificial intelligence methods.Using virtual screening and experimental validation to discover novel EGFR inhibitors.To achieve this goal,this thesis employs various machine learning algorithms to study on the structure-activity relationships of inhibitors for wild-type EGFR,L858R/T790M double mutant EGFR,and L858R/T790M/C797S triple mutant EGFR.Several structure-activity relationship models for predicting the biological activity of EGFR inhibitors are constructed.Based on this foundation,hierarchical virtual screening methods based on ligand and receptor are applied to a library containing over 5 million compounds,and novel EGFR inhibitors were successfully identified.The main content of this thesis includes the following aspects:(1)Investigating the structure-activity relationships of EGFR inhibitors using various machine learning methods.A dataset containing structural and bioactivity of 5,371 EGFR inhibitors was established,with the half-maximal inhibitory concentration(IC50)ranging from 0.003 n M to 6,500μM.Subsequently,the dataset was divided using a threshold of 100n M,with compounds having an IC50 lower than 100 n M considered highly active and those above 1,000 n M considered lowly active.Three fingerprint descriptors(ECFP4,MACCS,and RDK)and three physicochemical descriptors(CORINA,MOE,and RDKit)were calculated,and 24 qualitative classification models were established in combination with support vector machines(SVM),random forests(RF),logistic regression(LG),and fully connected neural networks(FCNN).The applicability domain of the models was also explored.Model prediction accuracy(Q)and Matthews correlation coefficient(MCC)were evaluated,with the SVM-ECFP4 model demonstrating the best performance.The Q of five-fold cross-validation for the training set was 0.92,while the Q and MCC for the test set reached 0.94 and 0.88,respectively,laying a foundation for subsequent EGFR inhibitor screening.In addition to qualitative classification models,1,301 compounds from the dataset based on fluorescence assays were selected and calculated CORINA descriptors,afterwards,six quantitative regression models using SVM and multiple linear regression(MLR)were established.The average coefficient of determination(r2)for the SVM regression model was0.731 on the test set,the average absolute error(MAE)was 0.538,and the average root mean square error(RMSE)was 0.713.Furthermore,k-means clustering was applied to the ECFP4fingerprint-based structural clustering of EGFR inhibitors,classifying them into eight compound groups.The molecular scaffold and fragment features of each group were analyzed,and highly active and lowly active fragments were identified.This research holds significant guiding implications for the future finding and design of new EGFR inhibitors.(2)Discovering novel EGFR inhibitors through the comprehensive application of various computational methods and screening strategies.In the first strategy,a combination of molecular 3D shape similarity comparison and quantitative structure-activity relationship(QSAR)model prediction was employed,which successfully identifying novel EGFR inhibitors from over 5 million compounds.Initially,molecular 3D shape similarity scores were determined using three representative query molecules,two derived from crystal structures and one obtained using a graph-based deep generative model.Subsequently,compounds predicted to be highly active were screened using the QSAR model.Finally,with EGFR inhibition activity bioassay experiment,nine structurally novel EGFR inhibitors(IC50less than 10μM)were identified from 18 compounds,with three hit compounds(hit 1,hit 5,and hit 6)exhibiting EGFR inhibitory activity with IC50 values of around 80 n M.Moreover,the MM/GBSA binding free energies of hit 1,hit 5,and hit 6 were calculated through molecular dynamics simulations,with all values found to be lower than-49 kcal/mol.Key residues involved in their interactions with the EGFR binding pocket were also investigated.In the second strategy,a combination of molecular 3D shape and electrostatic similarity comparison and molecular docking scoring methods was employed,for finding novel EGFR inhibitors.First,two potent EGFR inhibitors,AEE788 and Afatinib,were selected as query molecules,and compounds with high rankings were screened using a similarity search method based on molecular 3D shape and electrostatics comparisons.Then,molecular docking methods were used to study the binding affinity of each compound with the receptor and rank them accordingly.Experimental bioassay of EGFR inhibitory test results showed that 12 out of the 13 screened compounds were novel active EGFR inhibitors.Three compounds(A_1,A_2,and A_3)exhibited IC50 values ranging from 100 n M to 1,000 n M.This research confirms the effectiveness of cascade virtual screening strategies in discovering novel EGFR inhibitors.(3)A dataset of 379 compounds with inhibitory activity against wild-type EGFR(EGFRwt)and L858R/T790M double-mutant EGFR(EGFRL858R/T790M)was established Using ECFP4 fingerprints or SMILES as inputs,six 2D classification models were established in conjunction with support vector machines,random forests,and self-attention recursive neural networks.The models exhibited Q values above 0.98 and MCC values above0.76.Subsequent analysis of the important fragments in highly active compounds revealed that inhibitors containing phenylamine quinoline and phenylamine quinazoline,as well as those with methoxy or fluorine substituted phenyl groups,were more likely to exhibit high activity against EGFRwt.For EGFRL858R/T790M inhibitors,phenylamine pyrimidine,amide,phenylamine,methoxyphenyl,and thiophene pyrimidine amide were all considered highly active fragments.Next,based on the ECFP4 fingerprint,the 379 compounds were divided into six categories using a self-organizing neural network(SOM).It was found that most purine compounds exhibited high inhibitory activity against both EGFRwt and EGFRL858R/T790M,phenylamine pyrimidine compounds had high inhibitory activity against EGFRL858R/T790M,and phenylamine quinoline or phenylamine quinazoline showed high inhibitory activity against EGFRwt.Three groups of compounds with purine,phenylamine pyrimidine,and phenylamine quinoline/phenylamine quinazoline as scaffolds were selected,and three-dimensional comparative molecular similarity index analysis(Co MSIA)models were established.By analyzing the contour maps of steric,electrostatic,hydrophobic,hydrogen bond donor,and acceptor properties,favorable and unfavorable substituent types for inhibiting EGFRwt and EGFRL858R/T790M were identified.These research findings hold significant guiding implications for understanding and designing inhibitors against EGFRwtand EGFRL858R/T790M.(4)Quantitative structure-activity relationship study of the L858R/T790M/C797S triple mutant EGFR(EGFRL858R/T790M/C797S)inhibitors and the bioactivity prediction of EGFR inhibitors.Highly active EGFRL858R/T790M/C797S inhibitors BLU945,CH7233163,and TQB3804 were selected as query templates for parallel screening.First,based on the three-dimensional shape similarity scores of the molecules,the top 500 ranked compounds were selected.In the first screening scheme,a ligand-based quantitative structure-activity relationship model screening method was used.A dataset of 290 EGFRL858R/T790M/C797Sinhibitors was established,and ECFP4,MACCS,CORINA,and RDKit descriptors were calculated,leading to the construction of four SVM classification prediction models.The test set MCC values of the models were all above 0.75,and the area under the ROC curve(AUC)was above 0.87,showing good predictive ability.Using a consensus model to select compounds,nine candidate compounds were ultimately obtained.In addition,the SOM clustering method was used to cluster known and unknown activity compounds,gathering compounds with similar structures into the same neuron.The activity of unknown compounds was predicted based on the activity of known compound activity in the neuron,eventually yielding five candidate compounds.After enzymatic inhibition activity tests,it was found that compounds AM01,AS01,and AS02 had IC50values of 106.4 n M,524.6 n M,and 145.3 n M against EGFRL858R/T790M/C797S,respectively.The second screening scheme involved a ligand-receptor interaction-based method for discovering active inhibitors.Molecular docking was performed with the corresponding co-crystal structures found in the PDB database,and through consensus scoring,20 candidate compounds with predicted high binding affinity were obtained.After enzymatic inhibition activity tests,it was found that compound TD01,TD02,and TD03 had IC50values of 7.6 n M,33.9 n M,and 95.3 n M against EGFRL858R/T790M/C797S,respectively.Analyzing the compound structures revealed that all hit compounds were novel EGFRL858R/T790M/C797S inhibitors.These compounds can be used for further research,providing a hint for the C797S resistance mutation.In summary,this thesis focuses on the epidermal growth factor receptor(EGFR)as a research target and extensively investigates the structure-activity relationships of EGFRwt,EGFRL858R/T790M,and EGFRL858R/T790M/C797S inhibitors.By combining the ligand structure and physicochemical properties with ligand-receptor interaction information,various computational methods were employed,including similarity comparison,machine learning model prediction,molecular docking,and molecular dynamics simulation,to establish a joint virtual screening workflow.This workflow has successfully helped discover potential novel EGFR inhibitors,providing a reference for the development of anti-cancer drugs.
Keywords/Search Tags:epidermal growth factor receptor (EGFR) inhibitors, virtual screening, quantitative structure activity relationship, L858R/T790M mutation, L858R/T790M/C797S mutation
PDF Full Text Request
Related items