Font Size: a A A

Application Of A Group Of New Amino Acid Descriptors In Peptide Quantitative Structure-activity Relationship

Posted on:2022-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:T WuFull Text:PDF
GTID:2480306524974019Subject:Neurobiology
Abstract/Summary:PDF Full Text Request
Quantitative Structure Activity Relationship(QSAR)is the application of stoichiometric methods in the process of drug design and discovery.QSAR links the descriptors of chemical structure characteristics with biological activities.Because experimental methods are inefficient and expensive to determine the properties of a large number of proteins or peptides,computational methods such as quantitative structure-activity relationship analysis have been introduced to effectively solve such problems.Amino acid descriptors are quantitative values of the topological properties,physicochemical properties,three-dimensional structure or other properties of amino acids.Amino acid descriptors are used to quantitatively describe the structure and properties of amino acids.The main method of peptide QSAR research is to use amino acid descriptors to characterize peptides.In recent years,there have been many studies on QSAR,and some descriptors are based on the AAindex database about physical and chemical properties of amino acids.Recently,the AAindex database has been updated.And the modeling effect of the physical and chemical property descriptors is better than other types of descriptors.The advantages of physical and chemical property descriptors are clear meaning and easy interpretation.Therefore,in order to predict the relationship between structural changes and biological activity more accurately,we not only collected a total of 566 amino acid parameters in the AAindex database,but also comprehensively used the recently developed Gaussian process regression and random forest regression as well as four other more commonly used regression methods for modeling and quantitative structure-activity relationship research.The main content of this paper is that we collected 566 kinds of physicochemical property parameters of natural amino acids in the AAindex database,and collected bitter dipeptides,angiotensin converting enzyme inhibitors,bradykinin promoters,oxytocin,antibacterial,also the peptide sequence and experimental activity value of the peptides.The 566 kinds of physicochemical properties of amino acids are divided into four categories:hydrophobic properties,stereoscopic properties,electrical properties,and composition properties.Using principal component analysis method,we got a set of new amino acid descriptors H5,S8,E7,C5 and V9.The five amino acid descriptors were used to characterize the five peptide sample sets,and then divided into training set and test set according to the ratio of 2:1.We used six machine learning methods,including multiple linear regression,partial least square regression,support vector machine regression,least square support vector machine regression,random forest regression,and Gaussian process regression to model the quantitative structure-activity relationship,and performed comparative analysis of the modeling methods;we used leave-one-out method for internal verification and external verification to ensure the validity of the model.A set of statistical results could be obtained:training set fitting coefficient R2,root mean square error RMSEE,cross-validation fitting coefficient R2cv,cross-validation root-mean-square error RMSCV,test set prediction correlation coefficient R2pred,external cross-validation coefficient Q2est,the predicted root mean square error RMSEP to comprehensively evaluate the pros and cons of the built model from the model's fitting ability,stability,and the most important predictive ability.We found that several sets of models are significantly better than the previous descriptors in terms of fit,stability and predictive ability,E7-BTD-MLR,E7-BTD-PLS,E7-BTD-GP,S8-BTD-GP and V9-ACE-SVM,the statistical results of modeling R2,RMSEE,R2cv,RMSCV,R2pred,Q2est,RMSEP are:(1)0.946,0.140,0.796,0.273,0.913,0.915,0.193;(2)0.946,0.141,0.831,0.249,0.918,0.919,0.188;(3)0.943,0.145,0.830,0.249,0.929,0.930,0.175;(4)0.925,0.166,0.736,0.311,0.902,0.903,0.206;(5)0.903,0.310,0.790,0.457,0.939,0.939,0.243.The application of this group of amino acid descriptors to bitter dipeptides,angiotensin converting enzyme inhibitors,and oxytocin have achieved relatively good results.(1)C5 descriptor:For bitter dipeptides,modeling with MLR,PLS and GP methods have achieved better results;for angiotensin converting enzyme inhibitors,modeling with SVM method is very good.(2)E7 descriptor:For bitter dipeptides,modeling with MLR,PLS and GP methods have obtained very good results,and modeling results with SVM and LSSVM methods are also relatively good;for angiotensin converting enzyme inhibitors,the modeling result with RF method is better.(3)H5 descriptor:For bitter dipeptides,the modeling results using MLR,PLS and GP methods are better;for angiotensin converting enzyme inhibitors,the modeling results using PLS,GP and SVM methods are better;for Oxytocin modeled with the GP method got very good results,and the model with the PLS method is also better.(4)S8descriptor:For bitter dipeptides,modeling with GP method has achieved very good results,and the results of modeling with MLR,PLS,SVM and LSSVM are also good;for angiotensin converting enzyme inhibitors,using MLR,GP and SVM method are better;for oxytocin,the results of modeling with MLR,PLS and GP methods are better.(5)V9 descriptor:For bitter dipeptides,the results of modeling with MLR,PLS,GP and SVM methods are better;for angiotensin converting enzyme inhibitors,modeling with SVM method has achieved excellent results.The modeling results of the other five methods are also better;for oxytocin,the modeling results of the MLR,PLS and GP methods are better.For bradykinin promoters and antibacterial peptides,the results of modeling with six methods are not ideal.These five amino acid descriptors have different applicability to the five groups of peptide samples:For bitter dipeptides,the E7 descriptor has the best application effect,and overall it is the best to model with PLS and GP methods;for angiotensin converting enzyme inhibitors,H5 descriptor has the best application effect,and SVM modeling has the best effect.This group of physical and chemical property descriptors built with GP,PLS,MLR and SVM methods are generally more effective;and different amino acid descriptor applied to different peptide sample sets,the applicable modeling methods are different.
Keywords/Search Tags:quantitative structure-activity relationship, peptide, amino acid descriptor, principal component analysis, modeling method
PDF Full Text Request
Related items