| With the rapid development of computer technology and the wide application of large data industry scale and a geometric growth,structure and properties of active compounds/quantitative relationship has also been rapid development and rise to a higher level.From the initial application in the field of biology,and gradually extended to the field of pharmaceutical science,environmental science,medicinal chemistry,drug design,medicine and many other fields.The purpose is to study the relationship between the structure parameters and calculation of compounds through the use of various statistical methods and various physicochemical properties and biological activity between,so as to understand the microscopic structure of compounds at the molecular level.Because of its wide range of fields,it includes the biological activity of compounds,drug toxicity,and drug absorption rate in the human body,etc.Especially in the field of environmental chemistry,because more and more organic compounds into the environment,the adsorption effect of commercial chemicals in soil and sediment is one of the effects of migration and transformation in soil environmental behavior is an important process.However,the previous model of the QSAR is often used in shallow machine learning methods,such as heuristic methods,multiple linear regression,RBF neural network,back-propagation neural network,support vector machine model,they have in common is the role of the small sample size and the scale of the problem is not a particularly complex scene.This limits its generalization ability to deal with complex problems and massive data.In recent years,depth learning as a branch of machine learning has been widely used in many fields,and achieved a series of satisfactory results.Especially in the current era of big data,but also need to use the depth of learning technology to deal with many shallow machine learning model can not solve some complex problems.In this paper,the oral bioavailability,the CYP450 1A2 inhibitors and logKoc as the research object,with the deep learning algorithm is based on established drug classification and logKoc prediction model based on deep learning,mainly consists of three parts.The first part is researching on classification of oral bioavailability based on stacked autoencoder(SAE),using the stacked autoencoder learning algorithm by molecular 2D and 3D feature selection,through softmax finished the classification task.The model of Support VectorMachine(SVM)and Artificial Neural Network were compared to verify the effectiveness of the model using the stacked autoencoder prediction of the oral bioavailability.The second part is the classification of CYP450 1A2 inhibitors based on deep belief network.In this part,a deep belief network is proposed and 13900 compounds are used to predict the inhibitors of CYP450 1A2 based on the ideas of deep learning.We use the MACCS fingerprints as the characterization of molecular structure combining with the method of semi supervised learning to learn more essential feature representation.This step can avoid artificial feature extraction process and implement classification of CYP1A2 inhibitors.The third part is the researching on prediction of soil adsorption coefficient based on deep recursive neural network.Firstly,moleculars are represented by undirected graphs,then use RNN to abstract features of graph structures of molecules,and finally utilize the predicting the value of logKoc.The experimental results show that compared with other shallow learning models,UGRNN model achieves a better prediction effect of logKoc.Meanwhile,the proposed model combines with the method of Pearson correlation coefficient to find out a feature,and it is water partition coefficient(logP)as the input of the UGRNN model(UGRNN+logP for short)can further improve the prediction effect of logKoc. |