| The growing number of chemicals presents an increasing threat to the environment and human health,underscoring the need for better regulatory strategies.Traditional methods for assessing chemical toxicity through in vitro or in vivo experiments are time-consuming,laborintensive,and raise ethical concerns.Quantitative structure-activity relationship(QSAR)have been widely used to predict potential hazards of chemicals.Although machine learning has made significant progress in evaluating toxicity of chemicals,most machine learning models are built by selecting a single biological toxicity endpoint(or machine learning algorithm at random)and descriptors that relies on the accurate annotation of chemical structures,which lead to unreasonable chemical regulations.To address these limitations,we first propose a more rational framework for assessing aquatic toxicity of chemicals.In this framework,the ecological risks of each chemical are based on information about its toxicity to multiple aquatic organisms;moreover,the optimal model is selected from 21 different model combinations,including 16 models derived from 4 molecular descriptors and 4 traditional machine learning methods,and five deep learning models.Feature importance analysis explains the similarities and differences in interactions of different toxicities with molecular properties.We apply this framework to screen high-risk chemicals from over 10,000 compounds and analyze the differences in toxicity between different levels of the food chain,finding that higher organisms may suffer more serious side effects.Our research provides regulatory agencies with a more rational approach to regulating hazardous chemicals.Besides,in many scenarios such as complex water environments,determining the accurate structure of unknown compounds can be challenging.To address this issue,we link the electron ionization mass spectrometry(EI-MS)of organic chemicals to toxicity endpoints using various machine learning methods,avoiding the dependence of calculating descriptors on precise structures analysis.Our proposed method is validated by predicting the 50%growth inhibition concentration of Tetrahymena pyriformis and drug-induced liver toxicity of compounds.The bestmodels for both the training and test sets obtain R2>0.7 or balanced accuracy(BACC)>0.72.External experiments with 10 compounds not included in the dataset further demonstrate the application potential of our proposed method in predicting toxicity of unknown chemicals.Feature importance analysis enables us to identify key MS features that lead to chemicalinduced toxicity.Finally,we also predict the toxicity of herbal extracts,which demonstrating the potential of this method in predicting toxicity of mixtures.In summary,our method has great potential for toxicity prediction in such fields that it is difficult to determine accurate chemical structures. |