Font Size: a A A

Automatic Identification And Detection Of Atom-to-Atom Mapping Errors In Chemical Reaction Information Processing

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:D J MaFull Text:PDF
GTID:2428330545960951Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,and the application of machine learning methods.At present,more and more subject areas conduct scientific research by combining computer technology and machine learning methods.Cheminformatics is an interdisciplinary discipline that uses computer-informatics methods to solve chemical problems.In the modern information age,compounds are growing at an exponential rate,and the number of compounds in recent years has reached about 18 million.Therefore,people need to use computer methods to solve or index a large number of chemical information.When people use computer methods to automatically handle chemical problems,such as using Atom-to-Atom Mapping(AAM)of computers in chemical reactions,mapping errors may occur during this process.At present,there is no system that can perform atomic mapping completely and accurately.The atomic mapping problem is the basis for modeling and predicting compound properties in chemical informatics.Therefore,it is very important to automatically detect atomic mapping errors through computer-related algorithms.The data of biology,medicine,and chemistry usually have extremely high dimensions,strong heterogeneity,and more redundant information.Therefore,it is difficult to handle the data.Machine learning methods are needed to discover useful information and internal laws.Especially in areas such as computational chemistry and computational biology,machine learning has long been considered a standard and has played a powerful role.Support Vector Machine(SVM)is an important pattern recognition method.It is suitable for nonlinear,high dimensional,and small sample data modeling.The use of support vector machines in cheminformatics is the most widely used of all supervised learning algorithms.In this thesis,we use the Condensed Graph of Reaction(CGR)and SVM to design two schemes that can automatically identify atomic mapping errors.Its main work content and innovation are as follows:1.Obtain the Smiles code of the chemical reaction in the chemical database inthe practical application,and generate the corresponding Molfile format code through the package provided by the ChemAxon software.Finally,the obtained Molfile format is encoded by MarvinSketch software to generate CGR corresponding to the chemical reaction.2.The principle underlying the computer-generated CGR process is the mapping process of atoms.At present,automatic mapping of atoms by various softwares cannot be guaranteed to be accurate,and thus an incorrect CGR is generated.Therefore,the correctness of the atom mapping can be known by using the SVM method to judge whether the CGR is correct or not.By establishing a relationship model between descriptors of CGR molecular structure fragments and atomic mappings in chemical reactions,SVM prediction tasks are reduced to a binary classification problem(correct or incorrect atomic mapping),thereby realizing the automatic identification of atom mapping errors.3.Designing a relatively simple and highly efficient identification algorithm based on chemical bonds and chemical reaction mechanisms.By observing the wrong CGR legend and combining the chemical structure principle to propose a concept of“valid bond”,the two conditions of intersecting “effective bond” and abnormal carbon-carbon bond break in the CGR diagram are used as screening conditions,and the test result is finally obtained.Atom mapping errors are automatically identified and detected.
Keywords/Search Tags:Condensed Graph of Reaction, Models for Machine learning, Atom-to-Atom Mapping, Support Vector Machine
PDF Full Text Request
Related items