Font Size: a A A

Application Of Feature Selection Methods To Spectrometric Indification Of Organic Compounds

Posted on:2008-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2121360242963998Subject:Chemical Biology
Abstract/Summary:PDF Full Text Request
With the development of analytical science and computer technique, a large number of chemical data which contain abundant chemical information have been accumulated. Mathematics and chemometrics technology have been applied to interpret those chemical data. However, how to estimate the structure information of the compounds accurately from the enormous data is still a difficult problem for all the analysts.Infared spectrometry (IR) and Mass spectrometry (MS) are two of the most widely used analytical instruments in analitical chemistry field and plenty of IR and MS data have been obtained. In this paper, we utlized computers to assist those two spectrometric indification of organic compounds. In recent ten years, mathematic transformation used in spectrometric data and feature selection methods applied in data dimensionality reduction have attracted many researchers in chemometrics.Two spectra libraries have been studied in this paper: the OMNIC FT-IR spectra library and the NIST Spectra Library Version 2.0a. The raw spectral variables were transformed into new spectral features by mathmatics. To reduce the number of features, a series of feature selection methods such as Fisher ratios and Genetic algorithm-partial least squares (GA-PLS) were performed to get optimal feature sets. At last, five methods like K-nearest neighbor method (KNN), Support vector machine (SVM), Classification and Regression Tree (CART), PNN and AdaBoost algorithm combined with Classification and Regression Tree (AdaBoost-CART) were used to classify due to different structures.As a vibrational spectrometry, infrared spectra (IR) have wonderful ability in classifying the cis and trans structures of alkene. But because of the influence of special function groups, the researchers who have no enough analytical chemistry knowledge will be puzzled. As a result, the research theme in IR is to apply chemometrics methods to classify the cis and trans structures of alkene quickly and accurately. For the data from the OMINIC IR spectral database, two feature selection methods, Fisher ratios and Genetic algorithm-partial least squares (GA-PLS), and two classification methods, Support vector machine (SVM) and Probabilistic neural network (PNN) have been used to get optimization classifiers. It is demonstrated that both the SVM and PNN optimization classifiers give preferable predictive results about cis and trans structures of alkene and GA-PLS may be a more apropriate feature selection method in classifying cis/trans structures of alkene.Mass spectrometry (MS) is a commonly used instrumental technique for the characterization and identification of chemical organic compounds. In a mass spectrometer molecules of the investigated sample are ionized and the produced ions are separated according to their mass-to-charge ratio (m/z, mostly z=1), and their abundances are measured. So identification of compounds or automatic recognition of structural properties from mass spectral data seems more difficult. Mass spectral data are first transformed into a series of new spectral features. But because each spectrum is described with so many features, some features may not be necessary, and others may contribute only noise. As a result, feature selection is necessary. For mass spectra, two aspects have been studied: (1) Applications of feature selection and classification methods to predict four different chemical substructures of benzyloxy analogs that exist in many medicines and pesticide intermediates; (2) From GB 4839-1998, four kinds of pesticide structures which are organochlorine pesticides, organophosphorus pesticides, carbamate pesticides and pyrethroid pesticides are classified by combination of feature selection methods and classification methods. Experiment results showed that feature selection methods can improve predictive abilities of classifiers.
Keywords/Search Tags:feature selection, classification method, organic spectroscopy, infrared spectra, mass spectra
PDF Full Text Request
Related items