Font Size: a A A

Research On Fusion Models Of Compound Genotoxicity Based On Different Molecular Descriptors

Posted on:2022-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X T YangFull Text:PDF
GTID:2504306554959339Subject:Public Health
Abstract/Summary:PDF Full Text Request
The genotoxicity of the compound has serious adverse effects on humans and their offspring.Since any single detection experiment cannot detect all the toxicity mechanisms related to the occurrence of genotoxicity,it is necessary to use a combination of experiments for a comprehensive evaluation.However,traditional genotoxicity testing experiments cannot meet the testing needs of a large number of compounds due to their long cycle,high cost,and ethical problems of experimental animals.People have begun to use computer models to predict the toxicity of compounds.At present,most mutagenicity prediction models are based on a single experimental type,lacking the support of other experimental combination data,and do not conform to the weight-of-evidence principle,resulting in limited application scope and prediction ability.Objective1.Providing a basis for follow-up research by establishing a new database covering a variety of experimental types and detection endpoints,such as in vitro and prokaryotic and eucaryotic cells experiments.2.Comparing the performance of QSAR models based on different molecular descriptors,and provide a reference for the selection of molecular descriptors for compound genotoxicity prediction models.3.Comparing the difference between the single model and the fusion model,which provides ideas for discovering better modelling strategies and obtaining better QSAR models.MethodsBy combing through the three databases of GENE-TOX,CPDB and CCRIS,collecting the information of compounds structure and its various genotoxicity test data,this study established a new data set including in vivo and in vitro experiments,prokaryotic and eukaryotic cell experiments,etc..Then refer to the guidelines issued by the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use(ICH)and the genotoxicity research technical guidelines issued by the National Medical Products Administration(NMPA),and according to the weight-of-evidence method for genotoxicity judgments,different genotoxicities were combined in the form of experiments.The experimental results were divided into three groups and ensure that every groups of experimental results of the same compound were intact.Three algorithms of random forest(RF),support vector machine(SVM)and BP neural network were used to establish and verify the QSAR model.A total of 9 sub-models were established,and finally based on the same algorithm,the sub-models based on three sets of different experimental data were fused.Under the condition that the aforementioned basic research ideas remain unchanged,we use two types of molecular descriptors,which were screened by SHAP values and incorporated into the model to observe the performance of sub-models and fusions models based on quantitative molecular descriptors and qualitative molecular descriptors.ResultsOn the whole,the model established by using substructure descriptors,namely Pubchem molecular fingerprints,has achieved relatively better results.The prediction accuracy of the fusion models after five-fold cross-validation reached 83.4%,80.5%,and 79.0%,respectively;AUC value were 0.853,0.897,0.865,respectively.Whether in the sub-model or the fusion model,the random forest algorithm,one of the three algorithms,has achieved good prediction results.The performance of support vector machines and neural network algorithms were different.Based on the excellent performance of RF algorithm in other studies,it is recommended for mutagenic QSAR model research.Compared with the performance of the fusion models under each algorithm,the fusion models established according to the weight of evidence method has better prediction effect on the mutagenicity of the compound than the sub-models.At the same time,the fusion model of this research has achieved better prediction accuracy than the model that only used Ames experimental results to predict in similar studies.ConclusionThe above results show that although different types of molecular descriptors have different meanings for QSAR research,for genotoxicity data,it is more suitable to choose sub-structure molecular descriptors such as Pubchem molecular fingerprints for QSAR modeling.The QSAR fusion model based on multiple genotoxicity test endpoint experimental data and the use of the weight of evidence principle is more effective than a single model,allowing the model to more comprehensively make accurate judgments on whether a compound is genotoxic.It can be used as an early warning system for potential hazards of compounds,and can play a role as a "sentinel" role in the detection of compound genotoxicity.
Keywords/Search Tags:quantitative structure-activity relationship, genotoxicity, evidence weight method, molecular descriptor, molecular fingerprint
PDF Full Text Request
Related items