Font Size: a A A

Research On Molecular Classification Algorithm Based On Graph Neural Network

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:J H XiaoFull Text:PDF
GTID:2480306761459744Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In the process of human development,diseases always accompany human beings and threaten human health.Diseases not only affect human health,but even affect the stability of society.Therefore,it is very meaningful to develop corresponding drugs for specific diseases to reduce the adverse effects of diseases.The properties of molecules can help us develop drugs for specific diseases and design molecules with corresponding functions.Determining various properties of molecules is a critical step in drug discovery.Computer-aided drug design technology has been introduced into the computational prediction of molecular properties and has become one of the main research directions in bioinformatics.Classifying molecules is critical for screening drug candidates for specific diseases.Traditional machine learning algorithms can classify molecules,but molecules cannot be directly used as input to machine learning models,and a large number of experiments are required to obtain a series of molecular properties from molecules.This handcrafted feature relies to some extent on the experience of the researchers.The traditional feature extraction strategy is usually to calculate based on the three-dimensional structure of the molecule,and predict the properties of the molecule through the quantitative structure-activity relationship model,which is a very time-consuming process.Later,with the rise of deep learning,researchers were inspired by convolutional neural networks and proposed graph convolutional neural networks.Molecules can be naturally regarded as a graph and can be directly used as the input of graph convolutional network models.Graph convolutional neural networks learn representations of molecular structures directly from molecular datasets to complete predictions of molecular properties.Compared with traditional machine learning algorithms,the performance is significantly improved for classifying molecules.The success of graph neural network models relies on large amounts of labeled data.However,since characterizing molecules is a difficult task,molecular datasets are relatively small and there is a large amount of unlabeled data.The main work of this paper is as follows:(1)Aiming at the above problems,inspired by self-supervised learning,we propose a graph neural network model that integrates graph contrastive learning,and uses the structural characteristics of molecules to learn node features.Among them,the graph neural network is used to learn the high-level features of the nodes in the molecular graph.And then graph pooling is used to convert the high-level features of the nodes into graph-level features,which are responsible for predicting the properties of molecules.The graph contrast neural network uses the information of the molecular data itself to enhance the generalization ability of the model by means of self-supervised learning.The two tasks are concatenated and trained together.(2)To verify the effectiveness of the proposed model,we first use our model to compare the performance with other methods on the BBBP and SIDER datasets.The results show that on the BBBP dataset,the average AUC value of this model is 0.916,which is 7.8% higher than the latest Trim Net model,and also better than other comparison methods.On the SIDER dataset,the average AUC value of our model is0.688,which is 4.7% higher than the latest Trim Net model and higher than all comparison methods.Then,the function of each module of the model is verified by ablation experiments.The experimental results show that after adding the graph comparison learning task,the performance of the model is improved in various measurements compared with the original model.We also explore the impact of different graph neural network layers on the performance of the model,and verify the performance of the model on datasets with different proportions of positive and negative samples.In view of the imbalance of positive and negative samples in the Tox21 data set,we use the focal loss function to replace the original loss function to optimize the original model.The average AUC value of the optimized model is 0.861,which exceeds other comparison methods.Finally,the related work on molecular property classification is summarized,and then the future work is prospected.
Keywords/Search Tags:Molecular, Graph Convolutional Neural Networks, Graph Comparison Learning
PDF Full Text Request
Related items