| The automatic classification of chemical reactions is essential for the analysis of reactiondatabases, especially for metabolic reactions. Some researches have been implemented basedon the identification of the reaction center in this area. However, information of reactioncenters are not available in most of databases, and the applications of these methods arelimited.MOLMAP (molecular maps of atom-level properties) descriptor of chemical reaction isthe difference between MOLMAP descriptors of products and MOLMAP descriptors of thereactants. The descriptor is achieved without previous assignment of the reaction center, andit has potential to be used widely. In this thesis we studied photochemical reactions andmetabolic reactions by using MOLMAP descriptors of chemical reactions.1. Studies on classification of photochemical reactionsThe data set composed of356photochemcial reactions was extracted from the SPRESIdatabase (InfoChem GmbH, Munich, Germany) and each reaction consisted of two reactantsand one product. These reactions were manually classified into seven types. MOLMAPdescriptors of reactants, products and reactions were derived from the structures of reactantsand products which were represented by physicochemical properties and topologicalproperties of chemical bonds of the compounds. Three kinds of models were constructed byrandom forest:(1) The model to predict the type of reaction the reactants produce;(2) themodel to predict the type of reaction from which the product can be synthesized;(3) themodel to predict the type of the whole reactions. The results obtained herein were better thanthose results in our previous researches for the same data set. Therefore, the modification ofbond description is helpful for improving the prediction ability of MOLMAP descriptors. Inorder to obtain more robust and accurate models, the variable selection of physicochemicalproperties and topological properties of chemical bonds was performed by weka. After the variable selection, the subset of bond properties was used to generate MOLMAP descriptors.The assessment of performance of the selected subset was also performed.2. Studies on classification of metabolic reactionsEnzymatic reactions were extracted from the KEGG LIGAND database. These chemicalreactions are classified into six classes. The reactions catalyzed by hydrolyses are one of thesix classes. In this chapter,the data set was composed of619chemical reactions that werecatalyzed by hydrolyses. Further, these reactions were classified into eight subclasses.1238metabolic reactions were obtained because the reverse reactions were also included. Theclassification of chemical reaction was automatically predicted with MOLMAP descriptors ofchemical reactions by random forest. In addition, the variable selection of bond properties forgeneration of MOLMAP descriptors was also performed by weka. The studies on variableselection are ongoing. |