Font Size: a A A

Machine Learning Modeling And Prediction Of Domain-Peptide Recognition Affinity Based On Quantitative Structure-Activity Relationship

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2480306764969279Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The protein-protein association in cellular signaling networks(CSNs)often acts as weak,transient,and reversible domain-peptide interactions(DPIs),in which a flexible peptide segment on the surface of one protein is recognized and bound by a rigid peptiderecognition domain from another.Reliable modeling and accurate prediction of DPI binding affinities would help to ascertain the diverse biological events involved in CSNs and benefit our understanding of various biological implications underlying DPIs.Traditionally,peptide quantitative structure activity relationship(pQSAR)has been widely used to model and predict the biological activity of oligopeptides,which employs amino acid descriptors(AADs)to characterize peptide structures at sequence level and then statistically correlate the resulting descriptor vector with observed activity data via regression.However,the QSAR has not yet been widely applied to treat the direct binding behavior of large-scale peptide ligands to their protein receptors.In addition,there is no structural characterization method for biomacromolecular systems such as proteinpeptide complexes in the three-dimensional pQSAR.Therefore,the intermolecular interaction between the protein receptors and peptide ligands of such complexes cannot be characterized at three-dimensional structure level.In the thesis,pQSAR was systematically applied to the modeling and prediction of DPI binding affinities,which were established by characterizing peptide ligands at both sequence and structure levels.At the sequence level,over twenty thousand short linear motif(SLiM)-containing peptide segments involved in SH3,PDZ and 14-3-3 domainmedicated CSNs were compiled from various literatures and databases to define a comprehensive sequence-based data set of DPI affinities,which were indicated by the Boehringer light units(BLUs)derived from previous arbitrary light intensity assays following SPOT peptide synthesis.Four sophisticated machine learning methods(MLMs)were then utilized to perform pQSAR modeling on the set described with different AADs to systematically create a variety of linear and nonlinear predictors,and then validated by rigorous statistical tests.It is revealed that the genome-wide DPI events can only be modeled qualitatively or even semiquantitatively with traditional pQSAR strategy due to the high flexibility and intrinsic disorder of peptide conformation and the potential interplay between different peptide residues.In addition,unlike quantitative affinity indicators such as the Kd and ?G that are more suitable for pQSAR modeling,the arbitrary BLU values used to characterize DPI affinities were measured only via an indirect approach,which may not very reliable and may involve strong noise,thus leading to a considerable bias in the modeling and causing a moderate fitting ability,internal stability and generalization capability of the pQSAR models obtained by different MLMs.The Rprd2=0.7 can be considered as the upper limit of external generalization ability of the pQSAR methodology working on large-scale DPI affinity data.At the structure level,we proposed a novel three-dimensional pQSAR method based on protein-peptide complex structures and affinity data,namely comparative proteinpeptide interaction analysis(CoPPIA).From protein data bank(PDB)database and previous literatures we collected 171 protein-peptide complex structure data and their affinity values of dissociation constant(Kd).The partial least squares(PLS)regression was employed to establish the multivariate statistical relationship between the CoPPIAcharacterized descriptors and complex binding affinities.Consequently,a variety of threedimensional pQSAR models were built and then evaluated by rigorous internal and external validations.It is revealed that although the sample affinity Kd values has a higher accuracy,the resulting pQSAR models only exhibit a moderate performance for the modeling and prediction based on CoPPIA method;the predictive determination of coefficient still has a number of negative values even through the structure characterization with the combination calibration between three amino acid property types,i.e.polarity,hydrophobicity and stericity.The Rprd2=0.4 is assigned as the upper limit of the predictive ability of full-parameter CoPPIA models.In order to tackle this issue,we further employed variable selection to exclude the strong noise and statistically insignificant variable from the CoPPIA-characterized descriptors.Consequently,the resulting pQSAR predictors were improved largely,with a considerable increase in their internal stability and external predictive power;such variable selection-based CoPPIA models could be further applied for practical purpose.
Keywords/Search Tags:Quantitative Structure-Activity Relationships, Domain-Peptide Interactions, Amino Acid Descriptors, Machine Learning, Affinity
PDF Full Text Request
Related items