Currently,a large number of computational methods have been successfully applied to accelerate the drug discovery process.With the continuous emergence of biological and chemical data,it has become possible to design and utilize deep learning models to predict the binding affinity between drugs and proteins.Currently,most of the pioneering work has focused on representation learning at the amino acid or atomic scale of proteins or drug molecules.These works first encode proteins and small molecules into sets of embeddings of amino acids or atoms,and then the global pooling layer in the deep learning model aggregates multiple embeddings of amino acids or atoms.Finally,a simple feedforward neural network is connected and trained as part of the entire model for optimization to produce affinity prediction values.It should be noted that these pioneering works focus on the reasonable representation of proteins or drug molecules,while the global embedding aggregation method is often ignored by most study on drug-protein affinity.In completed works,the embedding aggregation part is usually composed of an average pooling layer and a feedforward neural network,which is the most concise and direct way.However,its disadvantages are also evident,as the model is too simple and a large amount of noise is not filtered and is input into subsequent model components.To address the issues mentioned earlier,this study designed a series of novel modules and proposed a new affinity prediction model,rzMLPDTA,which includes two core modules:a mean pooling based on multi-head mechanism(Multi-Head Mean Block,MHM)module and a gate-enhanced multilayer perceptron based on ReZero(Gated Multilayer Perceptron with ReZero,rzMLP)module.The MHM module can generate multiple global feature embeddings applicable to both proteins and small molecules,while the rzMLP module is a novel and slightly more complex global embedding aggregation method.Specifically,MHM combines the multihead mechanism and pooling techniques,while rzMLP is based on two recent advances in the field of deep learning:the gMLP model and the ReZero technique.The gMLP model is a multilayer perceptron with gate mechanism suitable for aggregating fixed-size embedding sets,while the ReZero technique is used to optimize the training process of the model.In addition,this study also utilized a protein language model to encode proteins,resulting in more expressive protein feature embeddings.By leveraging the strengths of the aforementioned components,rzMLPDTA can learn complex global functions without worrying about the difficulty of training deep models.This study tested the performance of rzMLPDTA on the Davis dataset and KIBA dataset,and the results show that our method outperforms the baseline methods on all evaluation metrics.Our ablation experiments also demonstrate that all the designed modules contribute to improving the performance of predicting protein-ligand affinity.Compared to traditional methods,the rzMLP module can improve the accuracy of drug-protein affinity prediction by 33%.In addition,our model is compared with other deep learning methods.The final results show that our model achieves the lowest mean squared error on both test datasets,which are 0.205 and 0.142,respectively. |