Font Size: a A A

Protein Secondary Structure Assignment And Function Analysis

Posted on:2017-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:C CaoFull Text:PDF
GTID:1220330482497008Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Secondary structure refers to local regular sub-structures on the protein backbone. The accurate assignment of the secondary structure of proteins from protein atom coordinates underlies the analysis of protein structure and function. It is also very important for protein classification, finding functional motifs in proteins, and understanding the folding mechanisms of proteins as well as for molecular visualization, protein comparison and prediction. Thus, protein secondary structure assignment is still an active research field in structural bioinformatics. More than twenty secondary structure assignment methods have been developed and are generally categorized into two groups, geometry based and hydrogen bond based. However, the agreement of secondary structure assigned by different methods has been recognized as relatively low. For one of the two main secondary structure elements in proteins, the helix, the discrepancies and inconsistencies among the different methods may well originate from their imprecise problem definitions; instead of rigorously following the helix geometry, they formulate the assignment problem as a restraint satisfaction problem in terms of restraints that either cannot be computed accurately(e.g., hydrogen bonds) or have no precise range(e.g., ?/ angles) or are not sufficient to define a general helix curve(e.g., virtual Cα bond length and angles).DSSP serves as the “gold standard” in the field even though the hydrogen bond is based on inaccurately calculated Coulomb energies as well as approximated hydrogen positions(with error), and hydrogen bond pattern overlaps cause DSSP to assign some secondary structures with irregular geometry. STRIDE, another popular method in the secondary structure assignment field, excludes residues with outlier backbone dihedral angles according to statistically derived backbone torsional angle information from helixes and strands assigned by DSSP, even when these residues form the appropriate hydrogen bond pattern. However, STRIDE only employs very local ?/ angles information to detect single residues with outlier ?/ to make assignments more uniform but does not treat the secondary structural fragment as a whole. As we know, several secondary structures and motifs such as the left-handed helix and -helix occur preferentially at functional sites, especially ligand sites, in proteins. Furthermore, we find that different regions of the Ramachandran plot show different preferences with respect to protein-ligand binding sites. Detecting ligand binding sites in proteins is a hot topic in bioinformatics, yet no one has applied the information from residue dihedral angles to protein-ligand binding site prediction.The main contents of this dissertation are as follows:1) A new helix assignment algorithm, HELIX-F, based on helix geometry.We describe a novel algorithm for the assignment of the helices in a protein using its Cα coordinates. Our assignment algorithm, though belonging to the category of geometric restraint-based programs, differs substantially from the other programs in that it relies on the fitting of backbone Cα atoms to a series of genuine helical curves. Specifically, it consists of two steps. The first step searches for a series of bona fide helical curves, each one best fitting the coordinates of four successive Cα atoms. The second step uses these best-fit curves as input to make a helix assignment. The assignment result shows that HELIX-F can accurately assign not only regular -helices but also PPII, 310 and -helices as well as their left-handed versions. The comparison of the assignments by our algorithm with the assignments produced by seven other geometry-based methods show that HELIX-F has the best agreement with the hydrogen bond-based program DSSP. Another salient feature of the algorithm is that the assigned helices are structurally more uniform than the helices assigned by DSSP.2) An analysis of the relationship between helix score(including helix parameters) and protein structure-function.A strong correlation exists between the -helix assigned by HELIX-F and protein-ligand binding sites; we found a correlation between the residues’ helix scores and their locations in proteins as well as their ability to establish hydrogen bonds. As the helix score increases the probability that the residue will be exposed to solvent increases; consequently, fewer hydrogen bonds are formed with other nearby residues. The best fit curves of successive atoms are applied in a helix model visualization of both the protein and DNA molecules, and an abrupt change in the polyline(connecting the centers of the individual helical curves) or a twist in the helix often occurs at a protein functional site. A distortion or twist in double-stranded DNA likewise often occurs at a protein-DNA interface. HELIX-F can also be employed to analyze the structural changes in protein folding. Last but not least, we analyze two types of functionally important but rare helices: the left-handed helix and the PPII helix.3) A new secondary structure assignment algorithm, SACF, using Cα fragments.We present a novel algorithm for the assignment of the secondary structure in a protein using its Cα backbone fragments. SACF can be viewed as a knowledge-based secondary structural assignment program, as it is derived from Cα fragments assigned by DSSP. SACF consists of the following steps: first, detect and exclude the “outlier” secondary structure fragments assigned by DSSP; next, derive the central fragments by clustering the remaining fragments; then, assign new fragments by aligning them with the template central fragments. SACF produces a more uniform overall secondary structure fragment assignment. We performed a large-scale comparison of 11 available methods on a database consisting of 2,817 structures. The result shows that SACF, KAKSI, and PROSS share similar agreement with DSSP, while PCASSO and STRIDE agree with DSSP best. We also analyze the terminal regions of helices and -strands assigned by different methods, as most disagreements arise in the terminal regions: if the DSSP assignment result is taken as the standard, PCASSO and SACF tend to reduce the residues of both the N and C cap; in contrast, KAKSI, P-SEA, and PROSS tend to add residues to the two cap regions. Another salient feature of the algorithm is that the assigned helices are structurally more uniform, and different secondary structure elements can be distinguished after their Cα fragments are aligned. This structural uniformity should be useful for protein structure classification and prediction, while the “outlier” fragments detected by our algorithm underlie structure-function relationships.4) Identification of ligand-preferring regions in the Ramachandran plot and analysis of the physicochemical properties of the residues with dihedral angles in these ligand-preferring regions. Furthermore, the development of MF-PLB and its successful application in protein-ligand binding site prediction.We found that residues with dihedral angles in certain regions of the Ramachandran plot have a strong preference for protein-ligand binding sites; thus, we performed an extensive analysis of the relationship between dihedral angles in proteins and their distance to ligand-binding sites, frequency of occurrence, molecular potential energy, amino acid composition, van der Waals contacts, and hydrogen bonds to ligands. The results shows that amino acids preceding the ligand-preferring region residues are more exposed to the solvent, whereas residues following the ligand-preferring region residues form more hydrogen bonds and van der Waals contacts with ligands. Furthermore, we discovered that residues show different PLB values as the solvent accessible surface area changes. Thus, a multiple-factor PLB(MF-PLB) that intergrades the ASA and ?/ angles was developed. MF-PLB was able to improve the performance of Ligsite-cs when applied to ligand binding site detection. The success rates for MF-PLB were significantly better than both PLB and Ligsite-csc when comparisons were performed on two popular databases. In addition, the average MF-PLB of a pocket(calculated by averaging the residue MF-RA values around the pocket) should also be useful for protein-ligand binding site analysis.
Keywords/Search Tags:helical curve, protein helix assignment, secondary structure assignment, cluster, outlier detection, protein Cα fragment, protein dihedral angle, protein ligand binding site prediction, analysis of protein structure and function
PDF Full Text Request
Related items