Font Size: a A A

Molecular Feature Extraction Strategy And Ensemble Learning Assisted Prediction For Environmental And Hazardous Properties Of Organic Compounds

Posted on:2021-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2491306107990629Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
The fundamental properties of organic compounds play a vital role in chemical engineering involving product design,safety assessment and solvent selection.As the basic target of industrial sustainable development,environmental benefits and process safety rely on the evaluation of environmental and hazardous properties,which promotes the development of chemical process and environmental science towards environmentally friendly and safe technology.However,many difficulties,such as limited resources of environmental and hazardous property databases,and experiments with long periods and dangers,are not conducive to the extensive experiments and the database update.With the development of computer and artificial intelligence technology,researchers have developed mathematical models as alternatives of experiments and achieved rapid and accurate properties estimations.Traditional group contribution methods play an important part in property prediction,whereas the characteristics such as the diversity of molecular structure division are not conducive to model development.For this,a novel method,which is characterized by good interpretability and discriminating power to isomers,is proposed to extract features from molecular structure.Each molecular structure has only one way in combining molecular features,which avoids different predicted values.Coupling the molecular feature extraction strategy and feedforward neural network which is optimized by five-fold cross-validation,property predictive models are developed using training data,and the extrapolation ability of models is evaluated using test data.New prediction models are developed with the Henry’s law constant dataset of pure organic compounds in water,and they present good predictive performance and extrapolating ability.The comparison shows that the introduction of three-dimensional molecular descriptor and clustering algorithm in feature vector and data partitioning enhanced the discriminating power to isomers and reasonability of predictive models,as well as improved the predictive performance.In contrast to reported models in literature,the developed predictive model employed fewer molecular features,and it shows better accuracy and generality.The root mean squared error is 0.2981,the mean absolute error is 0.1544,the coefficient of determination is 0.9856,and the adjusted coefficient of determination is 0.9853.In addition,an ensemble learning framework is constructed to couple individual predictive models based on different machine learning algorithms.New ensemble predictive models are developed,and the effect of ensemble learning which relies on heterogeneous machine learning algorithms on predictive models is investigated.The ensemble learning framework is optimized with the five-fold cross-validation,built with the training data.Coupling to the developed molecular feature extraction strategy,individual and ensemble predictive models are built with the training data,and the extrapolation ability is evaluated with the test data.New predictive models are developed based on the flash point dataset,and it is found that the ensemble predictive models show better predictive performance.The comparisons between individual models and ensemble models indicate that the predictive performance of ensemble models can be enhanced by improving the predictive accuracy and diversity of algorithms of ensemble models involved in the ensemble learning.The proposed molecular feature extraction strategy and ensemble learning framework are proven to be feasible in the development of property predictive models,and the developed new models present good predictive performance and extrapolating ability,which achieved the high-efficient development of high-performance predictive models.It provides accurate and reliable predictive tools for researches in chemical engineering including product design,safety assessment and solvent selection.
Keywords/Search Tags:quantitative structure-property relationship, predictive model, molecular feature, machine learning, ensemble learning
PDF Full Text Request
Related items