Font Size: a A A

QSPR Research On The Melting Points Of Some Organic Compounds Based On K-nearest Neighbors,K-means Clustering Algorithm And Projection Pursuit Pattern Recognition Methods

Posted on:2019-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:X MaFull Text:PDF
GTID:2404330572960793Subject:Drug Analysis
Abstract/Summary:PDF Full Text Request
The melting point of a compound is one of its fundamental physical characteristics.For a certain organic compound,there is a fixed melting point under certain conditions.It is mainly affected by intramolecular and intermolecular interactions.According to the detection of organic melting point,people can identify the nature of organics and determine the purity of matter.Generally speaking,researchers obtain the melting points of organic compounds by experimental or some empirical methods.But for some organic compounds,the existing experimental methods are not enough to determine their melting points.Therefore,it is necessary to predict the melting points of organic compounds by quantitative structure-property relationship.Using QSPR methods,the cost of time,money and labor for melting points determination can be reduced to some extent.Many achievements in melting point research have been made in quantitative structure-property relationship(QSPR).Different descriptors or modeling methods have been selected to get reasonable conclusions.On the basis of the above research,this paper extends the QSPR research method for the melting points.The main research work includes the following aspects.(1)Two groups of data were selected in this dissertation.The first group of data is about a kind of organic acid,which only contains C,H and O.The second group of data was about a kind of drug compound,most of which are lipid compounds and the others were ketones and amide compounds.This two groups of data were all complex non homologues.(2)ADMEWORKS ModelBuilder software was uesd to calculate and select the descriptors of the two data sets.Then the melting point was used as dependent variable and descriptors as independent variables to conduct QSAR research.First,the robust diagnosis method was used to select and eliminate the outliers of the data sets.Then three pattern recognition methods(K-nearest neighbors,K-means clustering algorithm and projection pursuit)are used to classify the samples.(3)In the unclassified or the classified samples,20% of the samples were randomly selected as the external test set.Then the remaining samples were divided into training andinternal test set by sphere exclusion algorithm.Finally,the modeling methods,including MLR,PLS and ANN,were used to predict the melting points of the training,the internal test and the external test set,respectively.(4)The similarities of the molecular structures were calculated,and then the influences of the similarities on modeling results were also investigated.(5)According to the error formula,the errors between the predicted and the corresponding experimental values were also calculated.(6)The three pattern recognition methods could be used to improve the QSPR results.According to the research results,the QSPR results were improved by using the three pattern recognition methods.From the results of similarity calculation,the prediction abilities of the models were not only related to the structure similarities,but to the modeling methods.For the three modeling methods,the prediction results of ANN were better than those of MLR and PLS.That is to say,on the whole,the prediction abilities of nonlinear models were better than those of linear ones.
Keywords/Search Tags:Melting points, Sample pattern recognition methods, Multiple linear gression, Partial least squares, Artificial neural network, Structural similarity, QSPR
PDF Full Text Request
Related items