Font Size: a A A

Compound-Protein Affinity Prediction Study Based On GNN And Transformer

Posted on:2024-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2544307079493144Subject:Engineering·Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The COVID-19 epidemic,which just passed and lasted for three years,has boosted the research of drug developers to develop drugs efficiently.Drug development for diseases can alleviate patients’ symptoms,reduce the burden on society,and improve people’s healthy living conditions.The traditional process of new drug development is time-consuming and laborious,and drug repurposing has become a viable option for drug developers.Whether it is new drug development or drug repurposing,the key step is to determine the interaction between the drug candidate and the disease target.The binding affinity of a drug and a target is an important indicator of the interaction,and the premise that a compound can bind to a target to produce a drug effect is that it binds to the target with high affinity.Today,in the era of big data and artificial intelligence,existing databases contain a large amount of data on drug compounds and protein targets and their interactions,which allows drug repurposing studies to expand beyond existing drug libraries to a larger compound space to find potential drugs or lead compounds with high affinity.Traditional methods for calculating compound-protein affinity are no longer able to meet the current efficient needs,and with the popularity of deep learning methods in various fields,the use of deep learning methods to develop accurate and efficient compound-protein affinity calculations is of great importance to accelerate the drug discovery process.In this paper,we construct a deep learning model for predicting compoundprotein affinity based on graph neural network(GNN),Transformer model and Mutual-Attention mechanism.The main work is as follows:(1)Firstly,the compound molecule is represented as a molecular graph structure and rich chemical information is constructed for each atom in the molecule,taking into account the two-dimensional topological structure information and chemical information of the molecule,and then the overall features of the molecule are extracted using GIN.Features are extracted and high-quality feature embeddings are generated for protein amino acid sequences using the Transformer encoder based on a self-attention mechanism that comprehensively learns the relationships between individual amino acid residues in proteins.It overcomes the drawbacks of previous work in which the use of SMILES sequences to characterize compound molecules could not contain rich structural and chemical features,and the drawbacks of using CNN to extract comprehensive molecular information from sequence data.Finally,the compound overall feature embedding and protein overall feature embedding connections were fed into a fully connected neural network to obtain affinity prediction values.Comparison and analysis with seven state-of-the-art benchmark models on two datasets demonstrate the highly competitive performance of the model in this paper compared with cutting-edge methods and show that the processing work done for molecular representation and extraction of features is effective.(2)Based on(1),we define the interpretability of compound-protein interactions as the intermolecular interactions of atoms in compounds and amino acids in proteins from the perspective of enhancing the interpretability of deep learning models,and add a Mutual-Attention module to enable the model to automatically capture the compound atoms and amino acid residues that contribute more to the interactions(with high attention scores).amino acid residues.Drug compounds usually bind and exert their effects at specific locations in the target protein,called binding sites(interaction sites),and atoms and residues with high attention scores can be considered as potential interaction sites.This refined model further improves the affinity prediction performance and visualizes the attention scores of the Mutual-Attention module of the model for a protein and its ligand.The results show that the high-attention sites and the actual interaction sites have partial overlap,which both enhances the interpretability and helps drug developers to narrow the search space for interaction sites.Finally,the improved model was used to screen high affinity drugs for Alzheimer’s disease(AD)-associated proteins in the FDA-approved drug library,and the results of this case showed that some of the screened high affinity drugs already existed in the anti-AD drug library,illustrating the practical application of the improved model in drug development and its reliability.In summary,this improved model has high accuracy and interpretability and can be applied to high affinity drug discovery for certain diseases.
Keywords/Search Tags:drug discovery, compound-protein affinity, GNN, Transformer
PDF Full Text Request
Related items