| Virtual Screening(VS)is a fast and economical method for drug discovery,which aims to identify potential compounds from a large compound library.It can be divided into Structure-Based Virtual Screening(SBVS)and Ligand-Based Virtual Screening(LBVS),and both need to be based on high-quality compound libraries.However,with the rapid expansion of the number of virtual compounds,a large amount of repetitive and unnecessary molecular information appeared,and the compound database became larger and redundant.These redundant molecules lead to increased time costs for screening out active compounds.Therefore,how to remove these redundant molecules has become a major challenge in the field of drug virtual screening.In this study,a core compound library was obtained through molecular clustering,which is an effective method for constructing high-quality compound libraries by grouping similar compounds into clusters to reduce redundant molecules in the library.Once molecular clustering is completed,representative molecules can be selected from each cluster to construct a core compound library.Furthermore,a structure-based virtual screening process based on this core library was established to accelerate drug discovery.The main research in this paper includes four aspects: preparation of the virtual screening database,construction of clustering models,evaluation of the clustering core library,and structure-based molecular docking.First of all,starting from seven commonly used virtual screening libraries,the downloaded small molecules were quickly screened,and a total of 380,911,984 small molecules that met the requirements were obtained.Then,the clustering method is evaluated by the three evaluation indexes of clustering,and the best clustering method applicable to this topic is obtained.At the same time,for the convenience of comparison,we use two methods for clustering,and get Cluster Model 1 and Cluster Model 2.We extracted the core molecules from both models,resulting in Chemical Core_1and Chemical Core_2,which contained 623,384 and 600,000 small molecules,respectively.The skeleton diversity and chemical space coverage of Chemical Core_1and Chemical Core_2 were compared using CSFP and PMI,and the results showed that Chemical Core_1 had better skeleton diversity than Chemical Core_2.Finally,we used Chemical Core_1 as the virtual screening database and developed a structure-based virtual screening process by molecular docking,which showed improved docking performance.In summary,this study integrates 7 virtual screening databases,and obtains the Chemical Core_1 database by means of clustering,narrowing the scope of virtual screening.Secondly,a structure-based virtual screening system for compounds was designed by molecular docking with target proteins to accelerate drug development. |