| Organic contaminants(OCs)emerging in the water environment pose a major threat to the health of humans and ecology.Advanced oxidation processes(AOPs)have been widely used to remove OCs due to their strong oxidative property,high degradation efficiency,and less secondary pollution.The oxidation rate constant(kAOPs)is an important parameter to evaluate the degradation performance and industrial design of AOPs.However,the method of calculating the kAOPsof OCs only by the experimental test has the problems of low efficiency and high cost.Quantitative structure-activity relationship(QSAR)models can be used to predict kAOPs of OCs with high throughput,making up for the lack of experiments.In this study,machine learning algorithms were used to construct QSAR prediction models of kAOPs among the three typical AOPs with hydroxyl radical(·OH),sulfate radical(SO·4-)and ozone(O3)as the main active substances,the kAOPs are recorded as k·OH,kSO4 and kO3,respectively.The main research work is as follows:Datasets of kAOPs in three AOPs were collected through literature searching,with sizes of 1389(k·OH),407(kSO4)and 494(kO3),respectively.6 machine learning algorithms(Support Vector Machines,K-Nearest Neighbors,Decision Trees,Random Forests,Gradient Boosting Decision Trees,and Extreme Gradient Boosting)and 13 molecular fingerprints(Morgan,MACCS,CDK,Ext,Graph,Estate,Pub,Sub,SubC,KR,KRC,AP2D,and AP2DC)were combined to construct QSAR models to predict kAOPs in AOPs.The algorithms were implemented based on the Python language,and the molecular fingerprints were calculated using the Rdkit software package and the Pa DEL-Descriptor software.The robustness,predictive ability,and applicability of the models were evaluated using ten-fold cross-validation,external validation,and molecular fingerprints similarity-based application domain characterization.The best k·OH,kSO4 and kO3 prediction models are RF-Pub(coefficient of determination R2=0.938,external validation coefficient Qext2=0.825,average similarity threshold Tmean=0.039),SVM-Pub(R2=0.993,Qext2=0.853,Tmean=0.067)and RF-MACCS(R2=0.925,Qext2=0.727,Tmean=0.099)model,respectively.The SHapley Additive ex Planations method was used to interpret the kAOPs prediction model based on the XGB algorithm,and the top 10feature sites in the Morgan fingerprint that contributed to kAOPs were counted,based on the molecular structural information represented by these sites,the key active sites of OCs in the oxidation reaction of AOPs were revealed.The results show that the aromatic C atom,double bond C atom,primary carbon and thioether S atom in OCs are the main active sites for the reaction with·OH;aromatic C atom,imino N atom,hydroxyl group connected to aromatic C,secondary carbon and the methyl group connected to the aromatic N are used as the main active sites to react with SO·4-;the hydroxyl group connected to the aromatic C,the amino group connected to the aromatic C,the aromatic C atoms,double bond C atoms and amino N atoms serve as the main active sites for the reaction with O3.The comparison of the main active site differences among the three AOPs provides theoretical guidance for the selection of AOPs for target OCs. |