Purpose.New drug discovery is divided into multiple steps which require significant time and money,making it critical to ensure that the right drug candidate is selected for the next stage.Although high-throughput screening methods can quickly and accurately obtain drug candidates for characteristic targets,high-throughput screening is costly and has a high failure rate,limiting the discovery of new drugs.Deep learning has achieved great success in several fields,and there is a large amount of research applying deep learning to new drug discovery.The current common molecular representations include molecular feature descriptors,molecular sequence information,voxelized lattices and molecular maps,and most of the methods face the problems of long pre-processing time and time-consuming high-throughput prediction.It is of great significance to find novel fast and effective representation methods to describe the structure of biological macromolecules.Method.We propose to apply point cloud-based deep learning models to protein ligand affinity prediction and ligand binding site prediction.In this paper,we use two deep learning models based on point clouds-a deep learning model based on a multilayer perceptron(PointNet)and a deep learning model based on a self-attentive mechanism and local sampling(PointTransformer).PointNet uses a multilayer perceptron to transform the points containing atomic information such as atomic coordinates,molecular weight,and molecular radius into a 1024-dimensional vector,extracts the feature representation using a symmetric function,and completes the prediction.The feature vector of length 512 dimensions is generated and the prediction is completed.Result.Comparing the pre-processing time of different methods,the point cloud-based deep learning method has an order of magnitude advantage in pre-processing speed.We produced training and test sets for protein ligand affinity prediction based on the PDBbind-2016 dataset and CASF-2016 dataset,training set for small molecule binding site finding based on the sc-PDB dataset.The test set for small molecule binding site finding was produced based on coach420 and holo4 K.Also,since the deep learning model based on point clouds is based on atomic information input,it can be visualized at the atomic level.Our targeted analysis of the data with better and worse performance in protein ligand affinity prediction helps us to further optimize the model prediction results at a later stage.Conclusion.In this thesis,we develop a series of models with the help of point cloud-based deep learning algorithms that can be applied to protein ligand affinity prediction and ligand binding site prediction.Our study demonstrates that point cloud-based deep learning algorithms have promising applications in the field of bioinformatics. |