Font Size: a A A

Research On Molecular Property Prediction Methods Based On BERT

Posted on:2024-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2544307160976529Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the history of human beings fighting against diseases,drugs have been playing a vital role in many fields of human health,such as prevention,diagnosis and treatment.However,drug discovery is a complex and time-consuming project,it often takes ten to fifteen years and hundreds of millions of dollars.Molecular property prediction is an important and fundamental work of drug discovery,which predicts physical and chemical properties from a large number of candidate compounds,in order to filter several drugs that meet the requirements.It reduces the cost and speeds up the progress of drug discovery.Therefore,molecular property prediction has attracted lots of interest from the industry and academia and has driven researchers to continuously propose accurate and reliable molecular property prediction methods.In recent years,with the development of bioinformatics and computer science technology,numerous machine learning or deep learning-based computational methods have been successfully applied to the molecular property prediction.These methods often utilize the molecular fingerprints,SMILES representations or molecular graphs for modeling and prediction,which have achieved impressive performance.However,existing methods exist many limitations.Firstly,the existing methods only consider a single form of molecular representation,and lack the comprehensive utilization of the molecular information.Secondly,the model structure of the existing methods often follows the classical network design,which lacks sufficient consideration of integrating the domain knowledge of molecules with model structure.Thirdly,most current methods ignore taking full advantage of the large amounts of unlabeled molecular data,which limits the expressive power of the model.In short,although the existing molecular property prediction methods have excellent performances,there are several breakthroughs in data utilization and model design in the area of molecular property prediction.To address the above issues,this study proposes a molecular knowledge-enhanced BERT method—KE-BERT.KE-BERT combines molecular SMILES representation and molecular graph as inputs,and encodes the features of atoms and chemical bonds in molecules and graph topology information,to form the Fused Molecular Embedding,which integrates rich molecular information for model learning.In addition,we design a novel encoder layer based on the Mix Hop Self-Attention mechanism,so that KE-BERT follows the mixhop message passage mechanism for molecular feature extraction,which follows the structural characteristics of molecules.Finally,we introduce several pre-training tasks including the molecular feature completion tasks and the molecular fingerprint prediction tasks,which drive KE-BERT to learn the prior chemical knowledge from a large amount of unsupervised data.It significantly improves the expressive ability and generalization performance of the model.Experiments show that KE-BERT generally performs better than existing methods in several molecular property prediction tasks.We analyze the design of KE-BERT,and the results show that all molecular embeddings integrate the multisource information for model learning,and the atom feature embedding and the bond feature embedding capture the local structure information of the molecules,while the laplacian positional embedding graph obtains the global topology information of the molecules.Then,the performance of KE-BERT with the mixhop self-attention mechanism is better than that with various attention variants,which implies the message passing patterns underlying the molecule are delicately integrated into the model learning mechanism.What’s more,all the molecular pretraining tasks have improved the performance of KE-BERT,especially the fingerprint prediction tasks prompting the model to focus on the important chemical substructures and achieve the best performance.Finally,this study also conducts the visualization analysis of the molecular representations learned by KE-BERT,which proves that KE-BERT has a good ability for interpretability and expressiveness.
Keywords/Search Tags:Drug Discovery, Molecular Property Prediction, Self-Supervised Learning, BERT, Deep Learning, Interpretability
PDF Full Text Request
Related items