Font Size: a A A

Quantitative Sequence-Activity Model Analysis Of Pharmacodynamics Peptides

Posted on:2015-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:L F WangFull Text:PDF
GTID:1220330470952241Subject:Pesticides
Abstract/Summary:PDF Full Text Request
Peptide plays an irreplaceable role for most physiological and biochemical function in biological organisms. In recent years, the development of pharmacodynamics peptide has received wide attentions, especially in the field of bio-pharmaceuticals. Comparing with traditional medication or bactericide, peptide medications have advantages of small molecular weight, good heat stability and strong specificity, and are difficult to produce immunogenicity. However, the bioactivities of most existing active peptides are far from ideal, and cannot satisfy the demand for application productions. Therefore, it has important theoretical significance and application value to make reconstruction for enough existing pharmaceutical peptides and then, make further optimum design for new peptides with higher activity.The space structure and biological function of peptide and protein are essentially determined by their primary structure, i.e., amino acid sequence.The senior spatial structure is hard to determine and difficult to be used as direct guidance for further synthesis; however, it is simple and convenient to determine the structure of primarysequence of peptide or protein. Hence, based on the primarysequence, the quantitative sequence-activity model (QSAM) has become an effective measure for the prediction and design of peptides with high-activity.Taking support vector regression (S VR) as basic modeling tool, this work has improved the QSAM model from4crucial modeling aspects, including structure characterization, feature selection, individual prediction and model interpretation, and has obtained an optimized model which could make efficient and stable prediction and provide clear guidance for construct optimization and reconstruction. The main works include:l.Sequence structure characterization, i.e., how to effectively transform the primary sequence structure of peptide into numerical descriptors which can be recognized by statistic model. This work took the531physico-chemical properties of natural amino acid (AA) as descriptor (named AA531) to make the comprehensive and integrated characterization for peptide sequences. Furthermore, we have introduced the geostatistics (GS) and multi-scale component (MSC) to construct the novel descriptorsGS-AA531and AA531-MSC, to overall consider the context association of peptide sequence, and to overcome the defect of descriptor AA531that cannot be suitable for the system containing peptides with unequal length.2.Feature selection, not all the features are useful for modeling, and the redundancy and useless features can make adverse effects on model precision and stability. Aiming at the high-dimensional feature set generated from the characterization of peptide with AA531-serie descriptors, this work has constructed a filter named binary matrix resetting filter (BMRF) to realize a fast and nonlinear feature reduction and then, make a subtle selection with our previously constructed multi-round last-elimination (MRLE) method, and finally obtained a small number of reserved features with definite statistics significance.3.Individual prediction, the peptides with similarmolecular structure and physicochemical property often have approximate activities. This work has firstly proposed "individual prediction for active peptide":Based on the GS semivariable function, we select the near neighbor samples for each tested peptide to compose its specific training sample set; based on the reserved descriptors and specific training set, the individual prediction was conducted for each tested peptide sample.4.Model interpretation, high-precision prediction and reasonable interpretation are always the two principal themes for regression modeL Aiming at the poor interpretation of SVR regression model, this work has introduced the F-test based significance test of regression for model, a significance test of importance for a single factor (non-linear partial regression) and effect analysis for a single factor. And the interpretation of peptide QSAM model has been significantly improved, which can be used to provide subsequent guidance for sequence optimization and reconstruction.From above main technical approaches, this work conducted the QSAM research for6different active peptide systems:Descriptor AA531was firstly used to characterize two active peptide systems:58angiotensin converting enzyme (ACE) inhibitor dipeptides and31bradykinin-potentiating pentapeptides (BPPs) and then,1062and2655features were generated for each dipeptide and pentapeptide sequence, respectively. The novel rapid selection method BMRF was proposed to non-linearly select those high dimensional features and then MRLE method was used for subtle screening, only10and13features were finally reserved, respectively. The QSAM model was established based on reserved descriptors and SVR. Compared with the widely used16kinds of amino acid descriptors and4QSAM modeling methods, our work shows a significant improvement in modeling performance, especially in external prediction, with major evaluation indexes Q(CV)2(internal cross validation) and Qext2(external prediction) as of0.9397and0.9532for ACE inhibitors, and0.9488and0.9538for BPPs. Meanwhile, the introduced interpretation system revealed that the activity of ACE inhibitor dipeptides has significant correlation with9specific physico-chemical properties including relative preference value at Nl, free energies of transfer of AcW1-X-LL peptides, etc., and the third residue is most important for the activity of BPPs, which provide the clear and definite guidance for subsequent design and optimization for active peptides.On the basis of descriptor AA531, this work extracted the structure information from the whole peptide sequence to construct descriptors GS-AA531and AA531-MSC, with the introduction of GS and MSC; and then, applied those two descriptors in characterization of two antimicrobial peptide (AMP) sets:101cationic AMPs (each sequence consists of15AA) and34AMPs with unequal lengths (14-19AA). The generated features were selected with BMRF and MRLE, and the reserved ones were used to establish QSAM model. The results of fitting, leave-one-out validation and independent test all confirmed that those two novel descriptors have advantages in capturing the context correlation and charactering the peptide system containing unequal-length sequences.In the QSAM researches of other two active peptide systems including55ACE inhibitor tripeptides and177HLA-A*0201restrictive CTL epitopes, this work firstly proposed the method "individual prediction for active peptide": After the sequence characterization and feature selection, each reserved features were given a corresponding weight value; with the introduction of GS semivariable function, the GS range was determined in the basis of weighted Euclidean distance; taking the rang as threshold value, the different near neighbor samples were selected for each tested active peptide and then, composed the specific training sample set for each tested peptide; and the individual prediction was conducted for each tested sample, with reserved features and specific training set. For active peptide systems, the feature elimination and near neighbor sample selection could comprehensively optimize their data matrix from both "row" and "line" two directions. The QSAM analysis results (especially for external prediction) of two active peptide systems both confirmed the efficiency of this combination method.
Keywords/Search Tags:pharmacodynamics peptide, quantitative sequence-activity model, support vectorregression, descriptor, feature selection, individual prediction, model interpretation
PDF Full Text Request
Related items