Font Size: a A A

Prediction Of Atomization Energy Of Organic Molecules By Improved Depth Neural Network

Posted on:2023-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2531307043995369Subject:Engineering
Abstract/Summary:PDF Full Text Request
Accurate prediction of molecular properties in the compound space(CCS)is a key factor for rational compound design in the chemical and pharmaceutical industries,and automated screening of CCS using both experiment and theory has become a powerful tool not only for the discovery of new systems,but also for targeted applications of chemicals It provides a powerful tool for the rational design of materials and materials.Atomization energy is one of the most basic properties of a compound.The atomization energy of some compounds can be determined experimentally,but for compounds that are difficult to measure or have not been synthesized,it is necessary to estimate or predict them.The atomization energy of a compound is closely related to the molecular structure,so the calculation and prediction of atomization energy can be achieved through theoretical analysis and model prediction methods established by experimental laws.The Coulomb matrix of charge composition predicts the atomization energy of molecules.However,with the increase of the number of molecules in the data set,the predicted root mean square error(RMSE)is also increasing.In this paper,a systematic atomic energy prediction method will be designed by combining the deep neural network and the Coulomb matrix eigenvalue molecular descriptor,and the bootstrap self-help algorithm will be improved accordingly to eliminate the defects that the self-help method cannot obtain the characteristics outside the sample and insufficient utilization,The improved algorithm is re applied to the deep neural network model to further improve the overall prediction performance of the model.The improvement ideas are as follows:(1)In order to solve the problem that the random sampling features cannot represent all the features and the distribution features at the observation points of discontinuous samples cannot be obtained due to the increase in the number of molecules,this paper adds the improved Bootstrap self-sampling method to the deep neural network.For dataset D,Firstly calculating the neighborhood of the data set D,which contains the original data set D and the set of sample neighborhoods,then perform self-help sampling on,and randomly sample the training sample.At this time,the training sample not only contains the original data The samples in set D also contain sample features other than the original data set.The improved self-help method can expand the original non-consecutive molecular sample data set into a continuous molecular sample data set,ensuring data continuity while at the same time.Molecular sampling is also extended to features beyond the original molecular sample.(2)In order to solve the problem of insufficient utilization of molecular sample data sets,the sampling method of the traditional self-help method will be improved.For data set D containing M samples,one sample is changed from one sample to M molecular samples at a time.The probability of these M molecular samples being taken out in one sampling is still the same as before.Repeat random sampling N times,and the obtained training data set contains M*N molecular samples,which improves the utilization of molecular samples and increases the number of molecular training samples.(3)In order to optimize the structure of the deep neural network model,the error back propagation algorithm is used to optimize the neural network,the learning rate is set to a dynamic descending form,and the number of training iterations is specified,the optimized neural network is likely to be over-fitted.In this paper,the over-fitting of the model is alleviated through early stopping and regularization strategies,and principal component analysis(PCA)is used for sample feature dimension reduction,and ten-fold cross-validation is used to validate the model.Finally,using the organic small molecule data collected from the PubChem database as experimental data,the DNN-Atom model is used to predict atomization energy.The results show that the proposed DNN-Atom model reduces the root mean square error of atomization energy prediction.Compared with DNN,XGBoost and SVM classical algorithms,the RMSE of the DNN-Atom model is the lowest,and the prediction results are more accurate.The results of the analysis of the extracted samples show that the extracted samples can clearly represent the characteristics of all samples,which proves that the proposed DNN-Atom model feasibility in atomic energy prediction.
Keywords/Search Tags:atomization energy, deep neural network, sampling feature, molecular descriptors, DNN-atom model
PDF Full Text Request
Related items