Font Size: a A A

Data-Driven Retrosynthesis Prediction Method Development Based On Machine Learning

Posted on:2022-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X R WangFull Text:PDF
GTID:2491306491982149Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Organic chemistry is the foundation of many modern scientific fields and profoundly changes all aspects of our lives.In recent times,researchers have been exploring ways to synthesize organic compounds more efficiently.Since the 1960 s,when computer technology first emerged,scientific researchers have used computer technology for synthesis planning.However,due to the constraints of computer algorithms and computing power at the time,the computer-aided synthesis planning algorithms developed by the pioneers did not receive widespread attention from the academic community.Nowadays,with the greatly increased computing power of computers and the rapid progress of artificial intelligence machine learning algorithms,computer-aided synthesis planning systems are also developing rapidly due to the integration of new methods.The retrosynthesis prediction algorithm is an extremely important part of the computer-aided synthesis planning systems.A high-precision and reliable retrosynthesis prediction algorithm can greatly save the working time and energy of researchers and reduce the probability of failure in synthesis exploration,thereby saving research and development costs,which is of great importance to the fields of drug discovery.In recent years,hybrid expert-AI retrosynthesis prediction algorithms have been able to help researchers find reliable synthetic routes for complex natural products,but such algorithms require a large team to manually encode chemical reaction rules and synthetic strategies after decades of effort,and have not been able to be fully automated.In view of the very fast update of the chemical reaction knowledge base,it is extremely difficult to completely manually encode chemical reactions.The pure data-driven retrosynthesis prediction algorithm is expected to automatically learn chemical reaction rules from massive data,which can completely free humans from tedious manual encoding.This paper is devoted to the research of machine learning data-driven retrosynthesis prediction algorithms,proposing a new single-step retrosynthesis prediction model and evaluated in detail the multi-step retrosynthesis prediction algorithms.The specific innovations and work content include the following three aspects:1.Propose a new data-driven single-step retrosynthesis prediction model Retro Prime,which is based on neural translation and integrates the retrosynthesis analysis ideas of chemists:(1)Identify reaction centers and convert target molecule into synthon(s);(2)Complete the leaving group to obtain reactant(s)corresponding to the target molecule.We use the Transformer model based on advanced natural language processing methods to complete both stages.Retro Prime achieves the Top-1 accuracy of 64.8% and 51.4%,when the reaction type is known and unknown,respectively,in the standard USPTO-50 K reaction dataset.And the Top-1 accuracy is close to the stateof-the-art transformer-based method in the million-level reaction dataset USPTO-full.Retro Prime also contains targeted strategies to address two problems with neural translation-based retrosynthesis prediction models: insufficient diversity and high chemical implausibility.For these problems,few researchers have proposed targeted strategies,and Retro Prime is designed to tackle these challenges.2.It was evaluated in detail that the influence of three search frameworks,reaction filters,and the reliability of the single-step retrosynthesis model dataset on the chemical plausibility of the planned synthesis paths.We also abstracted the retrosynthesis search tree with AND-OR Tree,which greatly improved the prediction performance of the benchmark algorithm,the depth-first algorithm.By testing up to 10 multi-step retrosynthesis model configurations in two test sets,we summarized three types of route planning problems of data-driven multi-step retrosynthesis models:(1)unreasonable prediction routes due to quality problems of raw data,(2)site-competing reaction conflicts in the synthesis pathway of compounds with multiple reaction center,and(3)the corresponding reaction conditions of the planned pathway are incompatible with the reactants.This work discussed in detail the effects of the components of the multi-step retrosynthesis model on the above-mentioned planning problems,and discussed the methods to solve these problems.
Keywords/Search Tags:Computer-Aided Synthesis Planning, Retrosynthesis Predictions, Machine Learning, Natural Language Processing, Tree Search
PDF Full Text Request
Related items