Traditional drug development is a time-consuming,expensive,complicated process with low-success rate.The biochemical data generated by various measurement technologies is increasing at an exponential rate,which provides new opportunities for mining the associations among data using machine learning methods.Molecular properties prediction is a basic task in the field of drug discovery,because it can help to determine the function of drugs.While molecular representation research is an important basis for drug screening and new drug discovery,and plays a decisive role in the prediction of molecular properties and biochemical activity,as well as antibiotic screening.Finding chemical substances with good pharmacological,toxicological and pharmacokinetic properties is a huge challenge in the field of drug discovery.The three-dimensional(3D)characteristics of molecules largely determine molecular properties and binding characteristics of target.However,the 3D topological structure is greatly affected by the molecular conformation and relative position orientation,leading to high cost and low accuracy in the existing 3D molecular representation methods.At the same time,the current molecular representation methods are mostly based on one-dimensional(1D)or two-dimensional(2D)molecular properties,ignoring molecular 3D structure,which affects the accuracy of downstream tasks related to molecules.Due to the limitations of existing molecular representation methods,the overall modeling of molecular representation learning is required to achieve effective design and extract features more effectively while fully considering the spatial topological structure of molecules.Based on the above background,this dissertation proposes a multi-perspective learning method to extract features from the "original" 3D structure of drugs from different perspectives.Specifically,the main research contents of this dissertation include the following aspects:·This dissertation proposes a molecular topology feature extraction algorithm based on the spatial-temporal gated attention module.Aiming at the problem of long-range dependency and semantic similarity,the attention mechanism is applied to the 3D Grid molecular structure.Therefore,the molecular topology feature extraction algorithm is proposed using the spatial-temporal gated attention mechanism.The spatial-temporal gated attention module consists of spatial attention,channel attention and gate mechanism.The spatial attention module extracts 3D molecular features and obtains the spatial attention score,the channel attention module obtains the attention scores of different channels(atom types),and the gate mechanism integrates above two parts of attention to obtain 3D grid attention of the global-level molecule.·This dissertation proposes an adaptive algorithm of graph convolutional neural network based on 3D rotation invariance.Aiming at the problem that 3D rotation invariance cannot be satisfied,a rotation invariant molecular feature mapping algorithm is proposed.It can ensure that the deep neural network constructed in 3D space meets a certain "invariance" for the generalization ability of the model.Then an adaptive graph convolutional neural network 3DMol-Net is proposed based on 3D rotation invariance,which is a general 3D drug molecular representation method,and its self-adaptability is reflected in three aspects,namely:inputting any molecular structure,automatically extracting 3D features,automatically constructing residual graph Laplacian neural network,and can be automatically applied to any task in any scene in an end-to-end manner.·This dissertation proposes a molecular property prediction algorithm based on the enhancement of multiple Simplified Molecular Input Line Entry System(SMILES).Aiming at the over fitting problem of small datasets,inspired by natural language processing technology,molecular SMILES string has been used as a kind of character methods in deep neural network model.However,deep learning model is hindered by the nonuniqueness of SMILES string.In order to effectively learn molecular features along all information paths and capture different structural features of molecules,multiple SMILES are encoded for each molecule as an automatic data expansion in the task of molecular property prediction,which alleviates the over fitting problem caused by the small amount of data in the dataset of molecular property prediction.·This dissertation proposes a molecular generation algorithm based on 3D molecular geometry.Aiming at the drug design problem using 3D representation,the molecular generation algorithm GEOM-CVAE is proposed based on geometric and constrained variational autoencoder.GEOM-CVAE considers 3D structure-based molecular visualization representation and protein 3D mesh-based graph representation to generate special molecules.3D geometric information is essential for successful molecular generation and design,that is different from the previous 1D or 2D-based molecular generation methods.GEOM-CVAE generates molecules in a two-stage manner:by transforming the 3D coordinates of molecules into special images to learn hidden space characterization,by using geometric-based graph convolution to extract protein features as the constraints of the model for generating molecules with special properties.In summary,this dissertation focuses on the characteristics of molecular spatial structure on the basis of 3D rotation invariance and deep neural network,explores molecular representation from 3D field in an adaptive and multi-perspective learning method for predicting molecular properties,biochemical activity and toxicity.Meanwhile this dissertation provides new thinking and insights on the molecular generation model.The extensive experiments have proved the effectiveness of proposed methods on the task of molecular properties prediction and its superiority compared to the baseline methods.The 3D-based molecular representation research has certain practical significance and application prospects. |