Font Size: a A A

In Silico Prediction Of Chemical Plasma Protein Binding And Endocrine Disruption

Posted on:2019-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:L X SunFull Text:PDF
GTID:2348330548462399Subject:Pharmacy
Abstract/Summary:PDF Full Text Request
Adverse ADMET(Absorption,Distribution,Metabolism,Excretion,and Toxicity)properties are crucial causes of costly failures in drug development.As the capacity of chemical synthesis and biological screening has dramatically advanced,there are urgent demands for plentiful data on ADMET information in early drug discovery.Numerous medium and high-throughput in vitro ADMET screening methods are now in wide use.Nevertheless,computational approaches,which are more efficient and low-cost,can significantly enhance our capability to predict pharmacokinetic and toxicity-related endpoints,thereby accelerating the pace of drug discovery.In this thesis,we intended to exploit machine learning methods for the prediction of chemical plasma protein binding and endocrine disruption potential.Specific contents are listed as following.Chapter one elaborately introduced the procedures of constructing quantitative/qualitative structure activity relationship(QSAR)models,the primary in silico methods used in this thesis.This section included some methods of data processing,molecular structure representation,machine learning and model evaluation for regression,single-label and multilabel models.Plasma protein binding(PPB)as an important pharmacokinetic and toxicokinetic property of drugs plays a significant role in drug design and development.In the second chapter,regression models were built using 6 machine learning algorithms combined with 26 molecular descriptors to predict PPB fraction of chemicals.Meanwhile,two consensus modeling strategies were employed,and the resulted consensus models marginally outperformed those individual ones.All the models were validated by ten-fold cross-validation and three test sets comprising 242 pharmaceutical,397 environmental and 231 newly-designed chemicals,respectively.The models showed reliable performance for the entire test sets,with mean absolute error(MAE)ranging from 0.126 to 0.178.Allowing for the experimental uncertainty of 0.061,which was estimated according to the data from different assay techniques,and the response scale of 0-1 for PPB,we believe that our models have achieved reasonable performance.Moreover,these key molecular descriptors used in the study were analyzed to explain the models,and a reasonable applicability domain was also defined to promote model utilityEndocrine disruption(ED)has become a serious public health issue and one of the drug adverse effects as well.Hence,it is urgent to conduct toxicology research concerning ED for chemicals in wide use.In chapter three,we tried to build models for prediction of ED.At first,modulator datasets for six ED targets were collected from Tox21 of the U.S.Environmental Protection Agency.After data processing and integration,a multilabel training set and test set containing 294 and 73 chemicals separately,as well as a single-label training set and test set respectively for each target,were achieved.Then,we constructed multi-target multi-label and single-label models for the prediction of endocrine disruptors.To cope with data imbalance,we used the strategy of combining multiple random under-sampler with voting classification to construct single-label models for each target,and the multi-label models were developed by five multilabel algorithms combined with twelve molecular fingerprints.The single-label models for each target yielded accurate prediction and Label Powerset,a multilabel method that can consider the dependency between labels,always reached the best performance.For a comparison between the single-label and multi-model models,we combined the best single-label model of each target to achieve multi-label prediction.The results demonstrated that the accuracy of the multi-label models was significantly superior to that of the combined single-label ones.Furthermore,we discovered that single-label models tend to predict the negative samples in multi-label dataset as positive,and fortunately our multi-label models can successfully overcome the defect.Overall,we could conclude that it is feasible to exploit the multi-label models by taking into account of label dependency information to improve model predictivity for endocrine disrupting chemicals.In the last chapter,a summary of the whole thesis was provided.
Keywords/Search Tags:QSAR, plasma protein binding, endocrine disrupting, multi-label model
PDF Full Text Request
Related items