Font Size: a A A

Prediction Of HERG Potassium Channel Blockage Using Ensemble Learning Methods And Molecular Fingerprints

Posted on:2022-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2518306314493194Subject:Biomedical statistics
Abstract/Summary:PDF Full Text Request
With the maturity of computer technology,computer-aided drug design has been widely used in people's daily life.The application of computer technology can improve the success rate of experiments,and can save a lot of time,money,and labor,and extend broader space for new drug research and development.By now,the use of computer-aided drug design has become a research hotspot of human health.Therefore,these studies adopt machine learning algorithms based on the structure of the compound,taking the current cardiotoxicity caused by organic compounds as the research object,and establishing a new predictive model for compound-induced cardiotoxicity with more accurate prediction performance.In predicting cardiotoxicity caused by organic compounds,the half-inhibitory concentration(IC50)of hERG channels,as a marker of potassium channel blockade,is a critical parameter indicator.In this study,9 molecular fingerprints and 55 2D molecular descriptors are calculated from a data set containing 1865 different compounds in the classification model.Then we use 3 machine learning algorithms to construct an integrated model of 30 compounds that cause cardiotoxicity based on the calculated molecular fingerprints and molecular descriptors.Among the results of the classification model,the Ensemble-top7 model has the best results in the five-fold cross-validation,with an accuracy rate of 84.9%and an AUC of 0.887.The ensemble model was tested with an external validation data set,and its accuracy rate was 85.0%,AUC reached 0.786.Compared with the model results reported in the reference,the Ensemble-top7 model we built has higher prediction performance and more accurate prediction results.In the results of the regression model,55 2D molecular descriptors were calculated from a data set containing 1631 different compounds,and three regression models were constructed using the same three machine learning algorithms.In the five-fold cross-validation of the regression model,the R2of the training set is 0.576,and the R2pred of the three validation sets is 0.549,0.681,0.711,respectively.The results show that the regression model constructed in this study can predict IC50parameters well,can effectively predict new compounds.Besides,we calculated several chemical structures related to cardiotoxicity inhibitors of compounds through random forest models.This information is very likely to provide very important and valuable information for future drug development and design.Innovative work of this study:this study is based on the structure of the compound and uses three machine learning algorithms to build a compound-induced cardiotoxicity prediction model,compared with the modeling methods reported in the literature,the ensemble model we established reached a higher predictive performance(higher than the results in the literature).With not obvious overfitting phenomenon,it can be used to predict the cardiotoxicity of compounds.
Keywords/Search Tags:machine learning, molecular fingerprinting, molecular descriptors, cardiotoxicity, ensemble models
PDF Full Text Request
Related items