| Screening and crystallisation studies of the crystalline forms of active pharmaceutical ingredients(APIs)are a key part of the drug development process.The crystalline forms of drugs can be classified into single component crystals,cocrystal and pharmaceutical crystalline salt according to their composition.Cocrystal and pharmaceutical crystalline salt can significantly alter the crystal properties and crystallisation behaviour of APIs by introducing new chemical components into the crystal lattice and are commonly used in drug development processes to improve the solid properties of APIs.However,there are many potential cocrystal and salt-forming agents,and the crystallisation of cocrystal and crystalline salts is influenced by various factors such as solvent,crystallisation method and supersaturation,making the screening of cocrystal and pharmaceutical crystalline salt very challenging.This study attempts to predict the formation of drug cocrystal or pharmaceutical crystalline salt and the solubility of APIs from various properties of molecules by using artificial intelligence techniques to significantly narrow down the screening of cocrystal agents,salt-forming agents and crystallising solvents,thereby significantly improving the efficiency of API crystal form screening and crystallisation process development.This study is divided into three parts,cocrystal ligand database development and cocrystal virtual screening;salt-forming agent virtual screening;and API solubility prediction in common solvents.For the cocrystal virtual screening,this paper first established a ligand database consisting of non-toxic and non-hazardous compounds,including nearly 300 cocrystal ligand molecules.In addition to the structural formulae of the compounds,the ligand database includes two main properties: thermodynamic properties included in large chemical software such as ASPEN,and wave function properties obtained by quantitative calculations,as well as properties such as molecular polarity indices and molecular electrostatic potential surface maxima obtained from wave function analysis.With 58 molecular properties as descriptors,100 positive samples and 100 negative samples were selected and seven machine learning models were examined,resulting in a virtual screening success rate of 97.6%.For the virtual screening of salt types,46 descriptors including molecular mass,molecular volume and molecular electrostatic potential surface optimum were formed with reference to the descriptors of cocrystal virtual screening.The virtual screening result of 76.4% was achieved.For solubility prediction,54 molecular properties related to intermolecular forces,including molecular volume,melting point and boiling point,were selected as descriptors.accuracy,reaching 79.6%,86.3% and 89.6% respectively.In this study,machine learning was used for cocrystal virtual screening,pharmaceutical crystalline salt virtual screening,and solubility prediction,and good results were obtained,demonstrating that this machine learning is promising for crystal form screening and crystallization development of drugs.The prediction accuracy can be further improved by expanding the sample size and using novel machine learning models. |