The continuous development of economy has led to the increasing energy consumption.As a kind of persistent organic pollutants widely distributed in various environmental media,polycyclic aromatic hydrocarbons(PAHs)emissions are also increasing.PAHs can enter the human body through various pathways,and their toxicity can cause great harm to the environment and biological health.Traditional biological research methods and chemical analysis methods are time-consuming,labor-intensive,and pose safety and ethical issues.Machine learning combined with quantitative structure-activity relationship(QSAR)can explore the change rule between the molecular structure and physical and chemical properties of compounds,so as to achieve rapid prediction of the toxicity of PAHs.Starting from the actual demand for precise prediction of PAHs toxicity,this article constructs a PAHs toxicity prediction model based on machine learning combined with QSAR.The full text is divided into five chapters,and the main research content is as follows:(1)Three prediction methods based on PLS-QSAR,LSSVM-QSAR,and RF-QSAR were established for the skin permeability of PAHs.Firstly,the molecular descriptors of 49 PAHs were calculated using E-dragon,and the molecular descriptors were preprocessed based on normalization methods to eliminate differences in magnitude;Then two variable selection methods,VIP and VIM,were used to screen the molecular descriptors,and a calibration model was constructed based on the screened molecular descriptors to predict the skin permeability of PAHs;Finally,the three obtained model results will be compared,and the coefficient of determination(R2),root mean square error(RMSE),and mean relative error(MRE)will be used as model evaluation indicators.The results show that the VIM-RF-QSAR model has better predictive performance,with a correction set result of RC2 is 0.9565,RMSECis 2.7307,and MREC is 11.24%;Cross validation results ROOB2 is 0.7092,RMSEOOB is 6.4009,MREOOB is27.41%;The prediction set result RP2 is 0.7490,RMSEP is 5.9335,MREP is 16.65%.This method has the advantages of time-saving,labor-saving,and accurate prediction results,providing a feasible method for accurately predicting the skin permeability of PAHs.(2)A prediction method based on RF-QSAR was established using the acute toxicity of PAHs as the research object.Firstly,E-dragon was used to calculate the molecular descriptors of 80 PAHs,and the molecular descriptors were preprocessed based on normalization methods to eliminate differences in magnitude;Then,a mixed variable selection method based on SPA and VIM was proposed to screen molecular descriptors,and an RF model was constructed based on the screened molecular descriptors to predict the acute toxicity of PAHs;Finally,compare the obtained results with RF-QSAR and VIM-RF-QSAR models.The results show that the SPA-VIM-RF-QSAR model has better predictive performance,with a correction set result of RC2is 0.9673,RMSECis 0.1162,and MRECis 4.40%;Cross validation results ROOB2is 0.7710,RMSEOOBis 0.2769,MREOOBis 10.86%;The prediction set results RP2is 0.7456,RMSEPis 0.2424,and MREPis 8.81%.This method has the advantages of high calculation efficiency and accurate prediction results.It is an effective and accurate prediction method for acute toxicity of polycyclic aromatic hydrocarbons.(3)A classification method based on RF-QSAR and PLS-LDA-QSAR was established with the mutagenicity of PAHs as the research object.Firstly,the molecular descriptors of 70PAHS were calculated using E-dragon and preliminary screening of the molecular descriptors was conducted;Then,the variable selection method VIM was used to perform feature selection on molecular descriptors,and a model for the mutagenicity classification of PAHs on human B lymphocytes was constructed based on the screened molecular descriptors;Finally,compare the classification results of the VIM-RF-QSAR and PLS-LDA-QSAR models.The results showed that the VIM-RF-QSAR model had good classification results,with accuracy of 0.8824,PPV is 1.0000,NPV is 0.6000,and 31 selected molecular descriptors.So RF method is a promising method for predicting PAHs mutagenicity. |