| Evaluation of binding free energy is a method to measure the strength of binding between molecules,which can be used to study molecular interaction processes in computational chemistry,materials science,biophysics,and other fields,has great significance for tasks such as discovery of new materials,new mechanisms of action,and new drugs.Currently,commonly used computational methods include those based on quantum mechanics,molecular dynamics,and molecular force fields,of which the first both methods have high accuracy but high calculation cost,and the third method reduces calculation cost but also accuracy due to making lots of approximations and assumptions.To meet the needs of large-scale virtual screening,scoring functions based on molecular force fields are commonly used in biochemistry to evaluate the binding free energy between molecules and proteins.This function approximates the binding free energy as a linear summation form of each force field term,and the lower computational accuracy greatly limits the efficiency of hitting effective structures.Moreover,for compound spaces of the order of 1060,calculating the binding free energy between molecules one by one is no longer a feasible method.At present,deep learning methods are developing rapidly,and various fields are exploring the adaptability of the method in their own fields.In the field of computational physics,deep learning has been widely used to solve the problem of balancing speed and accuracy.In the context of the above problems and related background,this paper starts from the principle of protein-molecule interaction calculation,uses the latest deep learning algorithm to extract interaction features from protein-molecule complexes,and deeply study how to build binding free energy evaluation algorithm with higher performance and generalization ability.Then,guided by this algorithm,the huge chemical space is explored by combining optimization algorithm and generative model to discover new molecular structures with greater activity and selectivity faster and more accurately.In addition,a forward-looking molecular optimization model is constructed in this paper,which attempts to implement coordinated optimization of the activity and multiple physicochemical properties of the molecules.By combining these three parts of work,this paper builds an end-to-end modeling framework with functions such as binding free energy assessment,molecular structure design,and molecular structure optimization,which realizes a fully automated and effective drug discovery process and provides a fast and accurate computational tool for new drug design.The main contents and conclusions of this paper are as follows.1.A binding free energy evaluation model,Deep Scoring,based on residual networks and dual-attention mechanisms is developed.The residual network is used to extract advanced feature representations of protein-molecule complexes,and then the dual-attention mechanism is employed to generate joint attention for the protein-molecule pairs,evaluating the contributions of paired residue atoms and ligand atoms to the calculation of binding free energy.The reasons for the poor generalization ability of existing algorithms are analyzed and the key to improving the model’s generalization ability is identified,which is to guide the model to learn protein-molecule interaction rules rather than simple molecular structure differences.The model achieved state-of-the-art classification and generalization performance on public datasets,outperforming molecular docking programs such as Auto Dock Vina and Glide that use a combination of molecular force fields and empirical scoring functions.2.A Transformer-decoder-based de novo molecular design model and a Wasserstein Auto-Encoder(WAE)-based peptide de novo design model are developed.Because of the highly complex chemical space,the sequence of small molecules represented in SMILES format have contextual associations,while the amino acid sequences that make up peptides lack such associations.Therefore,we construct de novo molecular design model using the language model Transformer and de novo peptide design model using WAE which can fully map sequences to high-dimensional potential space.The pre-trained Transformer-decoder model is fine-tuned and optimized by transfer learning and reinforcement learning to bias the likelihood function of the model toward the chemical space with specific properties.The evaluation results show that the Transformer-decoder-based generative model achieves the highest percentage of valid molecules to date(98%)and outperforms the previous models in terms of model robustness and structural diversity of the generated molecules.Evaluation of the WAE generative model showed that it achieved the lowest perplexity and reconstruction loss on the test set.By using the particle swarm optimization(PSO)algorithm to search for optimal solutions satisfying the constraints in the WAE latent space,new peptide structures with specific properties can be generated.In addition,the use of the PSO algorithm intuitively reveals the semantic linkage preserved in the WAE latent space,which greatly speeds up the optimization of the PSO algorithm compared to the VAE(Variational Autoencoder)model.These two generated models have been effectively validated on human breast cancer targets and lung cancer targets,and the generated novel chemical structures are found to have low binding free energy as validated by Deep Scoring,Glide and Auto Dock Crank Pep.3.A Transformer-based forward-looking molecular optimization model is developed.The molecular structure optimization problem is transformed into a conditional generation problem by using a complete Transformer model with desired properties and molecular activities as constraints.Unlike traditional matching molecule pair analysis methods,our optimization algorithm is a forward-looking algorithm that generates target molecules by sampling in the chemical space that satisfies the property constraints.The model has a success rate of 82%when implementing single-property optimization for the starting molecule and 59%for simultaneous optimization of multiple properties.As the range of target and starting molecule property variations is narrowed,the performance of the model can be further improved.Using the new molecular structure optimization task of BRAF as an example,it is demonstrated that our molecular optimization model can be effectively used for the molecular coordinated optimization task,where the target molecule possesses comparable binding free energy to the starting molecule while having good pharmacokinetic properties.In summary,using deep learning instead of traditional algorithms is an effective way that achieves a balance between computational speed and accuracy in evaluating binding free energy and predicting physicochemical properties in macromolecular systems.Moreover,using deep neural networks to capture patterns and features in high-dimensional chemical space allows for the fast and efficient design of molecules with specific properties.These algorithms are of great significance for targeted molecular design and structural optimization. |