With the rapid development of information technology,China’s drug research and development work has gradually entered a new era.From design to marketing,a new drug needs to go through many stages,such as drug discovery,preclinical research and clinical trial.As the first stage of new drug research and development,discovery of new drugs plays an important role and heavily relies on predicting the binding affinities of drug molecules to suitable drug targets.The binding degree of affinity is closely related to the biochemical reactions in the body.If people can understand the internal mechanism of these biochemical reactions,it will help to prevent some diseases.However,recognition of affinity takes a lot of time.To accelerate this process of identifying accurate affinities,computer-aided methods need to be applied in drug discovery pipeline.The development of such methods has promoted the further innovation of drug research and accelerated the process of new drug research.Traditional methods based on experience,methods based on machine learning and methods based on deep learning have been developed in the past ten years.While various computational methods have been applied to solve the problem of affinity prediction of drugs and targets,the most successful methods to date use 3D convolutional neural networks(3DCNNs).These 3D-CNN networks are based on deep learning models,and are both faster and more accurate than machine learning methods.However,currently used CNN is difficult to focus on the area where drugs and targets interact,and cannot learn global and spatial features,while our research hypothesizes that spatial features should be critical for structure-based binding affinity predictions.Therefore,the main work of this thesis is to further optimize the 3D-CNN network based on this assumption to better capture the spatial characteristics of the interaction between drug molecules and drug targets.Here this thesis propose an end-to-end 3D-CNN with spatial attention mechanisms by combining convolution neural network and attention mechanism,called saCNN,to encourage spatial feature learning.To verify the effectiveness of saCNN model,a series of experiments are designed in this thesis.When visualizing the learned spatial attentions in our experiments,it can be observed that saCNN model focuses more on the voxels near interaction centers.This key observation well supports our hypothesis that spatial features are critical for binding affinity predictions.In additions,our research has conducted a benchmark test and show that our model improves the Root Mean Square Error(RMSE)of the predicted binding affinities by 11.5%(with an absolute value of 1.117)and the Pearson Correlation Coefficient(R)by 3.2%(with an absolute value of 0.865)compared to currently most mainstream models on the PDBbind v.2016 core set.Not only that,the generalization abilities of our model is further demonstrated on CASF-2013 and CASF-2007 datasets.More importantly,this thesis analyzes and studies the problem of uneven data distribution,and transforms the regression task into classification task,which has achieved good results in the current popular models. |