Font Size: a A A

A Novel Variable Selection Method And The Application In QSAR Studies Of The Environmental Endocrine Disrupting Effect

Posted on:2013-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z S YiFull Text:PDF
GTID:1221330467487868Subject:Environmental safety and health
Abstract/Summary:PDF Full Text Request
With the development of structural characterization techniques, variable selection method plays a key role in screening molecular descriptors to establish predictive and robust quantitative structure-activity relationship (QSAR) models. Random researching methods such as genetic algorithm and art colony algorithm may not ensure that the same global optimal subset is found, while it is impossible to use all-subsets regression for subset selection when there is a large variable candidate database. In order to tackle the above problem, novel variable selection and model validation methods were proposed and combined with molecular simulation to establish new QSAR modeling technique. The new modeling method was applied in QSAR study on the environmental endocrine disrupting effects of structure-diverse compounds.1. Establishment of Uniform Design Cross Validation and Variable Selection Method Based on Variable InteractionA novel variable selection method, variable selection method based on variable interaction (VSMVI) was developed in the present study on the basis of three echniques, forward regression, variable selection method based on prediction (VSMP), and group methods of data handling. The method can solve the subset screening problem much faster than VSMP, which provides it obvious advantages over VSMP when handling large set of variables. Moreover, the performance of VSMVI in variable selection was assessed and verified by Selwood dataset.Tie characteristics of uniform design were adopted to modify leave-multiple-out ieross validation (LMOCV) and establish a novel model validation method, uniform design cross validation (UDCV), get the most possible amount of information with the minimum number of model validation. By using uniform design table, the ation strategy is achieved by setting sample number as a factor level and different as different LMOCV splitting way. Furthermore, the performance and validity VSMVI and UDCV was evaluated by comparison with Monte Carlo cross validation method.2. Method Application of VSMVI-UDCV Combination in QSAR Study on the ironmental Endocrine Disruptorsour typical nuclear receptors, estrogen receptor (ER), androgen receptor (AR), progesterone receptor (PR), and peroxisome poliferrator activated receptors (PPAR), were selected as model target to conduct QSAR study on receptor-mediated endocrine disrupting effects. Molecular descriptors were generated by E-Dragon, and combination method of VSMVI and UDCV was applied to choose optimal subsets and build robust QSAR model. The validation and explanation of model applicability domain was achieved by molecular docking analysis of outliers.(?) QSAR studies on ER mediated endocrine disruption effectsMethod validity of VAMVI and UDCV was verified through QSAR modeling study on2groups of structure-diverse environmental estrogens. Statistical parameters of the 8-variable QSAR model based on130compounds are r2=0.7370,qUDCV2=0.6376, qLOOCV2=0.6990, individually, while the correlation coefficients of two external sample sets are rpre2=0.6815, rpre2=0.5250, respectively. As for the other dataset with58compounds, statistical parameters of the5-variable QSAR mode are r2=0.8339, qUDCV2=0.7679, qLOOCV2=0.7702and rpre2=0.7872. Molecular docking analysis indicated that the difference in ER binding mode is dominated by ligand structure diversity and may contribute to the outliers in Williams graphs. Moreover, the relative compacted ER ligand binding pocket leads to the obvious central tendency of data along X axis in the graph. Descriptors entering into the models showed that the formation of hydrogen bonds is the predominant process in pollutant-ER interaction.(?) QSAR studies on AR/PR mediated endocrine disruption effectsAR share a quite similar3D-structure with PR, and its large hydrophobic cavity attributes to its low ligand selectivity. Method validity of VAMVI and UDCV was further verified through QSAR modeling study on several groups of structure-diverse environmental androgens and progestogens. Statistical parameters of the7-variable QSAR model based on118androgen-like compounds are r2=0.6866, qUDCV2=0.6422, qLOOCV2=0.6620, and rpre2=0.5825. As for progestogen-like compounds, the parameters for4H-benzo-furanone training set are r2=0.7861, qUDCV2=0.5407, qLOOCV2=0.7127and r2pre=0.4966, while those for steroid compounds training set are r2=0.8715, qUDCV2=0.7493, qLOOCV2=0.7915, and rpre=0.6686. Molecular docking analysis indicated that diverse ER binding modes derive from the ligand structure diversity, and the relative large AR/PR ligand binding pocket results in the clear decentral tendency of data along X axis in the graph. Descriptors entering into the models showed that the shape, size and charge are the most important structural feature for AR/PR-mediated effect. (?) QSAR studies on PPARγ mediated endocrine disruption effectsPPARy exhibits a unique3D-structure differing from that of ER, AR and PR, in which its possess a γ-shape binding cavity. Therefore, the decentral tendency of data along X axis in Williams graph can be expected. Method validity of VAMVI and UDCV was also verified through QSAR modeling study on two groups of structure-diverse PPAR agonists.. Statistical parameters of the10-variable QSAR model for PPAR binding affinity are r2=0.8321, qUDCV2=0.7450, qLOOCV2=0.7900, and rpre2=0.5565, while those of14-variable QSAR model for PPAR transaction effect are r2=0.6699, qUDCV2=0.4512, qLOOCV2=0.5837, and rpre2=0.3086.
Keywords/Search Tags:Variable Selection Method based on Variable Interaction, Uniform DesignCross Validation, Quantitative Structure-Activity Relationship, Endocrine DisruptingEffect, Molecular Docking, Estrogen Receptor, Androgen Receptor, ProgesteroneReceptor
PDF Full Text Request
Related items