| As part of absorption,the blood-brain barrier(BBB)protects the central nervous system by separating the brain tissue from the blood.In recent years,BBB permeability has become a key issue in the prediction of chemical absorption,distribution,metabolism and excretion(ADME).Traditional experiments are expensive and time-consuming,and have become the main bottleneck for high-throughput screening of macromolecular libraries.Nowadays,various computer simulation prediction models have been developed.These models can help us filter and predict the ADMET characteristics of compounds.The prediction performance of the model obtained by ensemble learning is better than that of the basic classifier.Therefore,in order to improve performance,we built an ensemble model to predict the BBB permeability of the compound.In this study,an ensemble model was developed to predict the permeability of BBB using 3 machine learning algorithms and 9 molecular fingerprints,and a synthetic minority oversampling technique(SMOTE)was used to deal with the problem of data imbalance.In the 5-fold cross-validation,the Ensemble Top-9 model obtained the best predictive performance,with an accuracy(ACC)of 0.930,area under the receiver-operating characteristic(ROC)curve(AUC)of 0.966,and sensitivity(SEN)of 0.964,and specificity(SPE)of 0.839;in the external validation set,with an AUC of 0.849,an ACC of 0.784,a SEN of 0.812,and a SPE of 0.712.This model may have high predictive performance for new molecules and can be used for early screening of central nervous system drugs.Because P-glycoprotein(P-gp)is expressed at high levels in the BBB,it prevents potential central nervous system drugs from entering the central nervous system(Central Nervous System,CNS).Therefore,we use 3 machine learning algorithms and 9 molecular fingerprints to establish and test ensemble models for predicting P-gp substrates and inhibitors to determine whether the drug is a P-gp substrate or inhibitor.Among them,for the prediction of P-gp substrates,the best model is Ensemble Top-5,whose AUC in 5-fold cross-validation is 0.840 and ACC is 0.759;in external test set validation,AUC is 0.838 and ACC is 0.760.For the prediction of P-gp inhibitors,the best model is Ensemble Top-9,whose AUC in 5-fold cross-validation is 0.918 and ACC is 0.849;in external test set validation,AUC is0.835 and ACC is 0.782.The AUC and ACC of the ensemble model are higher than those of the base classifier model built with the same training set,indicating that the ensemble learning method can improve the prediction performance of the model.Compared with the P-gp substrate and inhibitor prediction methods reported in the literature in recent years,our best ensemble model has achieved higher AUC and ACC.In summary,the innovations of this study are mainly in two aspects:(1)Adopting more stringent conditions to remove duplication of the data set used in the study(delete stereoisomeric repetitive molecules).(2)At the same time,three predictive classification models(compound blood-brain barrier permeability model,P-gp substrate model and P-gp inhibitor model)were constructed,and the blood-brain barrier permeability of the compound was explored more carefully.The performance of the obtained models was high and stable. |