Font Size: a A A

The Application Of Gene Expression Programming In The Classification And QSAR Of Organic Compounds And Food

Posted on:2015-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2298330431993828Subject:Medicinal chemistry
Abstract/Summary:PDF Full Text Request
Based on the advantages of genetic algorithms and genetic programming, ThePortugal scholars F. Candida proposed gene expression programming (GEP) methodfirstly in2001. GEP uses a simple encoding to solve complex problems and is efficientin solving a lot of problems. GEP surpasses genetic programming in more than2ordersof magnitude. Because the operation is simple and the function is superior, the GEPmethod has been widely applied to formula discovery, functions mining, discoveringassociation rules and many other fields.Traditional GEP methods used for classification are designed for binary(two-class) decision problems. Many GEP classifiers handle a multi-classclassification problem as multiple two-class problems by using the one-against-all(OAA) learning method. In the thesis, a projection discriminant analysis for directmulti-class classification using GEP (GEPPDA) is put forward. The proposedGEPPDA algorithm was used to classify the data of food and the persistence of theorganic compounds. GEP was also applied in quantitative structure-activityrelationship of toxicity of the organic compounds in this thesis. Specific work is asfollows:1A projection discriminant analysis for direct multi-class classification usingGEP (GEPPDA) is put forward in this thesis. GEP is firstly used to look for newsynthetic variables which are built as nonlinear combinations of the original features.The data is projected on the planes that are spanned by these new synthetic variables.Then the nearest centroid classification is employed to classify new samples. A newobjective function is formulated to determine the optimum synthetic variables. Theproposed GEPPDA algorithm was used to classify the persistent data of organiccompounds. The results show that the GEPPDA is an efficient tool for multi-classclassification. Visual inspection of high dimensional data using GEPPDA facilitates theclassification process and is helpful to understand data.2The GEP method was applied in quantitative structure-activity relationship oftoxicity of organic compounds. Compared with artificial neural network (BP-ANN) and partial least squares (PLS), the correlation coefficient for the training andpredicted set by GEP were better than that by BP-ANN and PLS. The GEP model isstable.3The GEPPDA was applied to classify near infrared spectroscopy obtained fromsix tea varieties and the data from Italy olive oil. Compared with the result by lineardiscriminant analysis (LDA), traditional GEP-OAA methods, that obtained byGEPPDA were satisfied.
Keywords/Search Tags:Gene expression programming, Multivariate classification, Quantitativestructure-activity relationships, toxicity
PDF Full Text Request
Related items