Font Size: a A A

Research On Metabolite Identification Method Based On Graph Convolutional Network

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:M C ChuFull Text:PDF
GTID:2480306329990729Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Metabolomics is an important part of systems biology,mainly for the analysis of metabolites in cells within a certain period of time.These metabolites are mainly small molecules involved in cell responses,which can provide detailed information about the state of the cell and provide the basis for subsequent research.Mass spectrometry is a standard method for identifying metabolites,which can perform detailed analysis of chemical samples.After the sample is ionized,the ion mixture will be generated.The mass analyzer in the mass spectrometer can separate the ion mixture according to the mass-to-charge ratio and obtain a mass spectrum.Fragmentation trees can be constructed using mass spectra to explain the fragmentation process of experimental samples.The fragmentation tree is represented by a set of nodes.Each node corresponds to a specific peak in the mass spectrum and indicates a molecular fragment of the metabolite,and is annotated with a molecular formula.The directed edge between the two nodes represents the fragmentation reaction in the mass spectrometer,annotated by the molecular formula of neutral loss.Graph convolutional network is a kind of neural network that can be used to model and analyze graph data.It can generalize the convolution algorithm in traditional regular data(such as image,text and video)to the graph domain.The graph convolutional network can use the feature information and structure information of the nodes in the graph to learn the latent representation of the nodes,which is an effective method for learning graph data.In this paper,we propose a metabolite identification method(Mol-GCN)based on graph convolutional network.The method uses heterogeneous graph convolutional network to model and analyze the metabolites in mass spectrometry data according to the node and edge information contained in the fragmentation trees,and then identifies the metabolites.Firstly,the model uses the fragmentation trees of metabolites to construct a heterogeneous graph containing two types of nodes,one of which represents metabolites,and the other represents the molecular fragments of metabolites.After that,multi-label prediction is performed for the metabolite nodes in the graph,and the molecular fingerprint of the corresponding metabolite is obtained,which uses the form of vector to represent a variety of attributes of the metabolite.In the process of querying molecular library,the fingerprints of candidate molecules in the reference molecular library and the predicted fingerprints of queried metabolites will be measured and ranked according to the similarity.Finally,the metabolites list is determined according to the ranking results.Different from the kernel method and support vector machine used in the FingerID model,our model based on graph convolutional network can automatically extract features from the fragmentation trees.In order to verify the effectiveness of the model,we conducted comparative experiments with the existing metabolomics identification model FingerID1.4 on the two public data sets of Mass Bank1 and GNPS.On the Mass Bank1 data set,the correct molecular structure assignments(Top1)of Mol-GCN model improved by 3.85%(1.3 percentage points)compared with FingerID1.4.On the GNPS data set,the correct molecular structure assignments(Top1)of Mol-GCN model improved by 12.56%(2.7 percentage points)compared with FingerID1.4.The experimental results verify the feasibility and effectiveness of the Mol-GCN model in metabolite identification.In addition,we also evaluated the effects of three different factors,the maximum quality difference allowed when generating the fragmentation trees,the connection among metabolite sample nodes,and the dimension of molecular fingerprints on the experimental results.The evaluation results show that the three factors will all have an impact on the experimental results,providing ideas for future experimental optimization.The research work in this paper verifies the effectiveness of graph convolutional networks in metabolite identification and provides new research ideas for molecular identification tasks in metabolomics.
Keywords/Search Tags:Graph Convolutional Network, Metabolomics, Metabolite Identification, Fragmentation Tree, Molecular Fingerprint
PDF Full Text Request
Related items