| Drug–target interactions(DTIs)believed to be an essential part of genomic drug discovery,and computational prediction of DTIs can accelerate to find lead drug for target,which can make up for the lack of time-consuming and expensive wet-lab techniques.Currently,many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target,but more efforts are definitely needed to further improve them.In this article,we use drug and protein sequences as input,and study the representation of drug and protein target based on deep learning methods via a lot of experimentations,and proposed sequence-based methods for accurately identifying DTIs.The main research contents are as follows:1)Building a classifier for identifying the ion channel–drug interaction called “i CDIW2 v Com”.Ion channels are the second largest drug target family.Ion channel dysfunction may lead to a number of diseases such as alzheimers disease,epilepsy,cephalagra,and type II diabetes.Many protein sequence-based predictors were developed to address the challenge while most of their results required to be improved or the web server of the predictors were missing.In this article,a sequence-based classifier,called “i CDIW2 v Com”,was developed to identify the interactions between ion channels and drugs.In the predictor,the drug compound was formulated by SMILES-word2 vec,FP2-word2 vec,SMILES-node2 vec and ECFPs via a 1184 D vector;ion channel was represented by the word2 vec via a 64 D,and we find that use AAindex to encode words(combination of any three consecutive amino acids)can get better word vectors for ion channel representation;finally,use the Light GBM model as prediction engine.The accuracy and AUC achieved by i CDI-W2 v Com via the 5-fold cross validation were 91.95% and 97.21%,which are remarkably higher than any of the existing predictors in this dataset.A friendly web server for i CDI-W2 v Com was established at http://121.36.221.79/icdiw2v/.2)Building a predictor with good generalization called "DTI-BERT".New generation large-scale pre-trained model provide a better model frame for sequences representation,and bring change to get a breakthrough in DTIs based on sequences.For target protein,we explored using pre-trained Bidirectional Encoder Representations from Transformers(BERT)to extract sequence features,which can provide unique and valuable pattern information.For drug molecules,many state-of-the-art algorithms were tested,and Discrete Wavelet Transform(DWT)was employed for generate information from drug molecular fingerprint due to its good performance.The feature vectors of the DTI pair were concatenated and input into feature extraction modules called BRL block and CNN model to extract DTIs features further.Subsequently,a BRL block was used as prediction engine.After optimizing the model based on contrastive loss and cross-entropy loss,it gave prediction accuracies of the target families of G Protein-coupled receptors,ion channels,enzymes and nuclear receptors up to 90.1%,94.7%,94.9% and 89%,which indicated that the proposed method can outperform existing predictors.To make it as convenient as possible for researchers,a web-server for the new predictor is freely accessible at:http://121.36.221.79/dtibert/.The proposed method may also be a potential option for other DITs. |