Font Size: a A A

Computer-assisted development of quantitative structure-property relationships and design of feature selection routines

Posted on:1998-05-28Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Wessel, Matthew DavidFull Text:PDF
GTID:2468390014476532Subject:Analytical Chemistry
Abstract/Summary:
Quantitative structure-property relationships (QSPR) can be used to develop models that accurately predict physical and chemical properties for organic compounds. If the assumption is made that structure is the primary influence on certain observed phenomenon, then numerically encoding the structure with descriptors can provide the means for a mathematical link between structure and some measured property. Linear regression and neural networks are the primary methods employed to build the models. The use of these computational tools is presented in this thesis.;Two descriptor (feature) selection routines using the genetic algorithm are developed and presented. The theory of the genetic algorithm and how it is used for both linear and non-linear feature selection is presented. A linear routine couples the genetic algorithm with linear regression. A non-linear routine couples neural networks and the genetic algorithm.;Three models are developed using linear regression and neural networks to predict reduced ion mobility constants K;Models that accurately predict normal boiling points for organic compounds containing have also been developed with regression and neural network methods. The accuracy of prediction of the models developed for three separate sets of organic compounds is similar to experimental errors in all cases. Each method is also compared to a widely used group contribution method for boiling point estimation.;The genetic algorithm non-linear feature selection routine has been applied to a specific set of data to test its usefulness. A high quality non-linear model is developed and presented. The behavior of computational neural networks has also been investigated, and is presented in some detail. It is hoped that a better understanding of computational neural networks can be gained by extracting some of the hidden neuron behavioral patterns.
Keywords/Search Tags:Neural networks, Feature selection, Structure, Organic compounds, Genetic algorithm, Models, Routine
Related items