| Heat shock protein 90(Hsp90)is the most abundant intracellular protein in mammalian cells.They exhibit specificity and high expression when the body is stimulated by heat shock.Studies have found that Hsp90 can simultaneously regulate proliferation,invasion,metastasis,angiogenesis,and anti-apoptosis.As Hsp90 can regulate multiple cancer pathways,now it is increasingly recognized as an important anti-cancer molecular target.Hepatitis B virus(HBV)infection is a global health problem,but current vaccines and antiviral drugs can not eliminate HBV.The HBV core protein plays an important role in multiple steps in the virus life cycle and can affect HBV replication.Therefore,the core protein has become an important target of HBV-specific antiviral drugs.This article focuses on two aspects of Hsp90 inhibitors and HBV capsid protein assembly regulators.A comprehensive collection of Hsp90 inhibitors reported in the literature,through a variety of machine learning algorithms to establish classification and quantitative models for their biological activity structure-activity relationship research.Taking HBV capsid protein as the target,using pharmacophore model and molecular docking method to carry out virtual screening of compound library.The main research contents of this paper are:(1)Using a variety of machine learning algorithms,a classification model for the high and low activity of Hsp90 inhibitors was constructed.The 1,321 Hsp90 inhibitors reported in the literature were collected as a classification model database,and six fingerprint descriptors were used to characterize the structure of the compound.Combining five machine learning methods(support vector machine,random forest,multi-layer perceptron,decision tree and gradient boosting tree)20 Hsp90 inhibitor biological activity classification prediction models were established.Among them,the ECFP4 fingerprint descriptor combined with the SVM algorithm model(model 2A)performed the best,with a prediction accuracy rate of 91.02%on the test set and an MCC value of 0.80.Through the information gain analysis of ECFP4 fingerprint descriptors,the important descriptor features of Hsp90 inhibitory activity and ECFP4 substructures that have positive and negative contributions to the activity are found.In addition,the K-Means algorithm is used to group 1321 compounds into eight sub-categories,and the structural characteristics of each sub-category are summarized.(2)Using support vector machine and random forest methods,a quantitative prediction model of the biological activity of Hsp90 inhibitors was constructed.The Hsp90 inhibitor containing 305 small molecule compounds was collected as a quantitative model database,and two molecular descriptors(Corina,RDKit)were used to characterize the structure of the compound.Combining the two algorithms of support vector machine and random forest,12 quantitative regression prediction models are established.Among them,the model of Corina descriptor combined with random forest algorithm(model B1)performed best,with a correlation coefficient of 0.80 on the test set and an MSE value of 0.37.After analyzing the descriptors,it is found that the number of bonds,molecular complexity,hydrophobicity,and electronegativity have important effects on biological activity.(3)With the HBV core protein as the target,the pharmacophore model based on core protein-ligand complex and the rigid molecule docking screening based on LibDock and the semi-flexible molecule docking screening based on CDOCER were performed on the two databases Specs and ChemDiv.After an established hierarchical virtual screening process,71 candidate compounds were successfully screened,and through clustering of their skeletons,10 emerging compounds were finally selected.Through the molecular docking interaction analysis of two of the compounds C522-1961 and C289-0097,it is found that they have good bond forming properties at the binding site of HBV core protein. |