Font Size: a A A

Research On Molecular Generation Method Based On Flow Model

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiuFull Text:PDF
GTID:2510306614458424Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence and the maturity of technology these,deep learning has been widely applied in various fields.There are more and more cross applications between computer science and many other science fields.Deep learning algorithms play an important role in these aspects.Among them,the drug design and generation of new molecules by artificial intelligence method is a new research direction in recent years.It is a typical representative of the cross research field of computer science and chemistry.The chemical space of molecules is very large,and the molecular structures found at present are only a small part of the whole chemical space.In order to reduce the cycle of research and the cost of development,in recent years,many researchers have tried to use deep learning to find more potentially useful molecules,and propose a large number of molecular generation models.The representation of molecules in computers can be divided into two types: one is the simplified molecular input line entry system,also named as SMILES string;The other is molecular graph data structure.For one-dimensional SMILES string,the molecular spatial structure information is bound to be lost,so we focus on the molecular graph data structure.For the work based on molecular graph data in recent years,the flow based model performs a best effect and show more potentials,but there are still some problems that need to be improved.Most of the previous work directly models the molecular data,which ignored the role of chemical substructure in the composition of molecular diagram;Secondly,most of the existing generation models model molecular data in a continuous data form,which has an anti-discretization operation on the discrete molecular data,and they don't model the node information and edge information of molecules separately,these result that the model can not accurately learn the distribution of molecular data;Finally,in the process of actually synthesizing a new molecule,it is often through some already known molecules with similar structure to synthesize new molecular compounds through appropriate chemical reaction,which requires the design of a model that can generate new molecules according to specific properties,and the model also needs to give the already known molecule with similar structure to the new molecules.Most of the existing research works on molecular generation is mainly random generation,or optimize the specific indicators such as the drug properties and synthesizability of molecules.They rarely edit molecules according to a variety of chemical properties of molecules to generate new molecules,and do not give known molecule similar to the new molecular structure,which does not significantly reduce the difficulty of synthesizing new compounds.In order to improve these aspects,we propose three flow based generation models to solve the problem of molecular generation.Firstly,we propose a molecular graph generation model Comp MF based on compressed flow.The original molecular graph is compressed with some common chemical substructures in molecular graphs,and a compressed flow model is constructed with the compressed molecular graph structures.In the process of generating molecules,a compressed molecular representation is generated first,and then restored to the correct molecular diagram structure through the reverse process of the compression process.After compression,the molecular dimension has been significantly reduced,and the input of the model also contains certain chemical rules,which further ensures the chemical effectiveness of the generated molecules.Secondly,we propose a molecular graph generation model Dis MF based on discrete graph flow.The model uses discrete method to build the flow model,and models the node and edge information of molecules separately.Finally,molecules are generated node by node and edge by edge in an autogressive way.The model eliminates the antidiscretization operation of molecular data,avoids the computational consumption of frequent calculation of Jacobian determinant,and models information of molecular node and edge more accurately.Finally,we propose an attribute editing flow molecular graph generation model AEMF,which not only models the molecular data,but also trains an attribute editing network Attr Editor,which learns the mapping between the attribute changes of molecules and the encoding changes in the latent space of molecules.In the process of generating molecules,the original molecule and target attributes are given.According to the difference between the target attributes and the current original molecular attributes,the model will calculate the latent space codes of new molecules with target attributes and similar molecular structure.These latent space codes will be decoded by the flow model,and finally generate molecules that meet the expectations.
Keywords/Search Tags:Conditional flow model, Discrete flow model, Autogressive model, Molecule editing, Molecular graph generation
PDF Full Text Request
Related items