QSAR and classification modeling: Prediction of biological activity of organic compounds from molecular structure | Posted on:2006-07-04 | Degree:Ph.D | Type:Thesis | University:The Pennsylvania State University | Candidate:He, Linnan | Full Text:PDF | GTID:2451390008960916 | Subject:Chemistry | Abstract/Summary: | PDF Full Text Request | With the recent development of combinatorial chemistry and high-throughput screening (HTS), molecules can be synthesized on a scale of tens to hundreds of thousands.;The aim of this research is to develop and apply computational methods in reducing the structure space to achieve greater focus and to improve the odds of identifying active drug candidates, and also to virtual compounds screening or libraries permitting the rapid and cost-effective elimination of poor candidates prior to synthesis. More specifically, we develop Quantitative Structure-Activity Relationships (QSARs). QSARs analysis studies the relationship between organic compounds' molecular structure and the corresponding activity in an inductive way. This relationship is expressed by mathematical models which can predict activity of organic compounds solely based on their chemical structure. Central to the construction of QASRs is the maximum extraction of information from datasets. Our ultimate goal is to get the most representative robust and reliable QSAR model from the dataset for the desired activity prediction. Besides quantitative activity prediction done through the quantitative QSAR models, we also study qualitative prediction with the employment of binary QSAR (classification models) to classify compounds into bioactive and inactive groups.;In the first part of the dissertation, the methodologies of QSAR models development, including binary QSAR models are presented. The model development begins by encoding the chemical structure of each compound in a large number of calculated numerical descriptors. This pool of descriptors is then reduced by feature selection to identify a subset of descriptors that carries the most abundant information. The final selected descriptors thus best map the biological activity. QSAR models are then developed from this reduced pool of descriptors with various algorithms. Multiple linear regression analysis and computational neural networks are used in quantitative QSAR model development. Classification methods, such as k-nearest neighbor, linear discriminant analysis, and the probabilistic neural network are implemented in qualitative QSAR model development. The development process ends with model validation. All models are validated with an external prediction set to assess the models' prediction accuracy for unknown compounds. Randomization testing is also conducted to detect the possibility of chance correlation.;The second part of the dissertation is composed of several application studies. (Abstract shortened by UMI.)... | Keywords/Search Tags: | QSAR, Prediction, Activity, Compounds, Development, Structure, Classification, Organic | PDF Full Text Request | Related items |
| |
|