Over the past few decades,phenotypic drug discovery has been limited due to its inefficiencies and low throughput.Recently,with the rapid development of the next generation sequencing technology,massive determination of the phenotypic changes of cell under drug perturbation has become feasible.Hence,there has been a resurgence of interest in phenotypic drug discovery.In 2006 and 2017,CMap and LINCS database produced large-scale transcriptional profiles by treating dozens of human cancer cell lines with tens of thousands of drugs,greatly promoting the development of this field.In phenotypic drug discovery,the key issue is how to accurately measure the similarity between transcriptional profiles.Aiming at this,numerous computational strategies including KS,GSEA and XSum have been proposed.However,limitations exist,for example,the similarity between transcriptional profiles is artificially defined in an unsupervised way.In addition,there is little research on how to apply phenotypic drug discovery in sudden virus outbreaks(such as SARS-CoV-2)so as to make a quick response and recommend treatment options in the absence of experimental data.In conclusion,two studies were conducted:(1)by adopting metric learning algorithm,we optimized the existing algorithms and presented Dr Sim,which can be used for drug mechanism annotation,drug repositioning and personalized medicine;(2)we presented i DMer,which can be used for repositioning drug in sudden virus outbreaks.The main contents and conclusions of the first study are as follows:(1)through tSNE dimensionality reduction visualization analysis,it was found that cell line and time point factors have a great impact on the distribution of drug perturbation transcriptional data,while drug concentration factor has little impact on it,demonstrating that those two factors should be considered in annotating drug mechanism and repositioning drug;(2)in drug annotation scenario,we applied Dr Sim to internal and external validation datasets,and the results shown that the average accuracy of Dr Sim were 0.383 and0.272,which were much higher than 0.168 and 0.156 of the other tools.In addition,the accuracy of all the tools including Dr Sim dropped a bit in the external validation datasets,proving that data heterogeneity impacts their performance;(3)with the accumulation of training data,among all the tools,only the performance of Dr Sim can be further improved.With the accumulation of drug perturbation transcriptional data in the future,it is believed that the learning-based Dr Sim will have a more outstanding performance;(4)in drug repositioning and personalized medicine scenario,we applied Dr Sim to eight cancer cell line,three cancer type and two patient datasets,and it was shown that the average accuracy of Dr Sim was much higher than the other tools;(5)In Alzheimer,only Dr Sim successfully predicted the FDA-approved drug Memantine.The main contents and conclusions of the second study are as follows:(1)virus outbreaks,such as SARS-CoV-2,are often sudden.In order to recommend treatment options at the initial stage of virus outbreak,we presented i DMer,which is based on phenotypic drug discovery and can reposition effective drugs only relying on the viral genome information;(2)i DMer can identify effective drugs against HIV,Ebola,MERS,SARS-CoV and SARS-CoV-2,demonstrating its rationality;(3)using SARS-CoV-2genome information,effective drugs Homoharringtonine and Emetine were identified by i DMer;(4)by adopting graph attention neural network,the performance of i DMer is further improved;(5)i DMer predicts the combination of antiviral and antiinflammatory drugs to prevent patient die from cytokine release syndrome and suppressing the replication of virus.Interestingly,our study found that Emetine has antiviral and anti-inflammatory dual role,makes it a highly promising compound effective against SARS-CoV-2 infection. |