Quantitative sequence-activity model (QSAM), being an novel branch of quantitative structure-activity relationship (QSAR), mainly engages in constructing structure-activities relations by starting with biomolecular primiry sequence, and then gives out resonable functions to desctipt it, aiming to prediction and guiding strutural modification for unknown property functions. In this paper, we commit ourselves to structural characterization approaches in QSAMs, making a series of achievements in attempts to charaterization methods of peptide-protein interaction, nucleotide sequence untis and entirety, ligand-receptor interaction mode, phesudo vaccine library design, simulation of hydrophobic chromatographic retention behavior, pridictions on peptide advanced structures etc. these achivements are mainly included of the followings:①Via a principle component analysis (PCA) disposal of 36 kinks of property parameters for natural bases, one principle component score is obtained and serves as information descriptor for single base, termed as principle component scores vector of base propertied variables (VBPV). Primary structure representation for 38 E.coil promoter sequences by VBPV descriptors is combined with multiple statistic methods and a QSAM is given out between the resulting representation parameter and transcription promoter strength. This model is successful with its r and q respectively of 0.9712 and 0.9515. Based upon that, target sequences are subject to site-directed mutagenosis, yielding 5 novel E.coli promoters of patient promoting activities. This result is just to be tested by experiments.②Derived from 149 hydrophobic factors of 20 natural amino acids, a novel amino acid descriptor termed as generalized hydrophobicity scale (GH-scale) was proposed by principal component analysis (PCA). Via genetic algorithm-partial least square (GA-PLS) method, QSAR model was constructed by GH-scale for 152 human leukocyte antigen HLA-A*0201-restricted cytotoxic T lymphocyte (CTL) epitopes with the model estimated and cross-validated correlative coefficients of r2=0.813 and q2=0.725, respectively. It was indicated that hydrophobic interaction played an important role in HLA-A*0201-CTL interaction, prominently at anchor residues.③Common atoms in organic compounds are typed by families in periodic table of elements and their hybrid states, and a new rotation-translation invariant 3D-MSC methods as three dimensional holograph vector of atomic interaction field (3D-HoVAIF) is obtained in calculations of three kinds of non-bonding interactions (e.g. electrostatic, van der Waals and hydrophobic interactions). Applying 3D-HoVAIF to perform systematical QSAR studies on two groups of classical peptide sample sets (e.g. 58 angiotensin-converting enzyme (ACE) inhibitors and 48 bitter-tasting dipeptides), the resulting GA-PLS model outperforms most reference reports, with modeling r2, q2, RMSEE and RMSCV of 0.857, 0.811, 0.376 and 0.432 for ACE inhibitors; and 0.940, 0.892, 0.153 and 0.205 for bitter-tasting dipeptides, respectively. Averagely partitioning training and test set by D-optimal for both the two sample sets, we perform rigorous statistical validation on the 3D-HoVAIF descriptor and simultaneously make comparisons with two classical amino acid indices as z-scale and ISA-ECI. Furthermore, the model in this paper has been subsequently utilized to implement predictions for 400 theoretically possible dipeptides with respect to their ACE inhibiting activities and bitter-tasting thresholds, and by correlation analysis, ACE inhibiting activities of dipeptide-like compounds are found to prominently relate with bitter-tasting intensities. Thus it is difficult to find dipeptide compound simultaneously possessing of satisfying pharmacodynamic action (high ACE inhibiting activities) and comfortable tastes, suggesting dipeptide active components which are served as functional foods to lower blood pressure, are not very ideal.④Defining direct contact residue types in HLA-A*0201 protein with each position of HLA-A*0201-restricted CTL epitope and several non-bonding ligand/receptor interactions, the non-bonding interaction matrix of 4 types is constructed and upon which, structure-based quantitative structure-activity relationship (SBQSAR) study is performed on 266 HLA-A*0201-restricted CTL epitopes. The resulting GA-PLS model is well consistent with reference reports and molecular graphics demonstrations. Hydrophobic and hydrogen bond interaction are found to play important roles in antigen recognition and presentation, especially prominent at positions of anchor residues of the antigen peptides.⑤Upon QSAM-based virtual vaccine library project, a reasonable human leukocyte antigen HLA-A*0201-restricted CTL epitopes library is designed in this paper. The process is as follows: 1) synthetical property score (SP-score) of amino acids is derived from 516 physicochemical properties by principal component analysis (PCA); 2) based on the SP-score, genetic algorithm-partial least square regression (GA-PLS) is used to construct the quantitative sequence-activity model (QSAM); 3) employing QSAM as evaluation tool, CTL epitopes are optimized by GA; 4) frequency f of each amino acid in different position of excellent antigen peptides is calculated; 5) the amino acid in condition of f>F (F is the random probability, of 1/20 for the natural amino acid) is reserved to be useful residue type at this position and a structural element of the combinatorial library.⑥Interaction between proteins and stationary phase in hydrophobic interaction chromatography (HIC) is differentiated into two thermodynamic processes involving direct nonbonding/conformation interaction and surface hydrophobic effect of proteins, hence quantitatively giving rise to a binary linear relation between HIC retention time (RT) at concentrated salting liquid and ligand-protein binding free energy. Then, possible binding manners for 27 proteins of known crystal configurations with hydrophobic ligands are simulated and analyzed via ICM flexible molecular docking and genetic algorithm, with results greatly consistent with experimental values. By investigation, it is confirmed local hydrophobic effects of proteins and nonbinding/conformation interaction between ligand and protein both notably influence HIC chromatogram retention behaviors, mainly focusing on exposed portions on the protein surface.⑦First, protein primary sequences are characterized in the vector forms by SP-score, and then auto-correlation function, which is based on vector forms, is defined and nonlinearly transformed with respect to its calculating spcace by introducing Mercer kerner techniques, ultimatly constructing a new peptide sequence charaterization method: kerner sequence auto-correlation function (KSACF). By applying KSACF into classification stuidies of 632 nonhomologous proteins with known crystal structures, it is indicated that KSACF is able to approciately extract information on protein primary sequence features and to find internal relations hiddened among different amino acid residues, thus implementing accurate simulation and prediction on different protein structures. |