Navigating the extremes of biological datasets for reliable structural inference and design

Posted on:2014-04-18

Degree:Ph.D

Type:Dissertation

University:University of Pennsylvania

Candidate:Hannigan, Brett T

Full Text:PDF

GTID:1450390005495720

Subject:Biology

Abstract/Summary:

Structural biologists currently confront serious challenges in the effective interpretation of experimental data due to two contradictory situations: a severe lack of structural data for certain classes of proteins, and an incredible abundance of data for other classes. The challenge with small data sets is how to extract sufficient information to draw meaningful conclusions, while the challenge with large data sets is how to curate, categorize, and search the data to allow for its meaningful interpretation and application to scientific problems. Here, we develop computational strategies to address both sparse and abundant data sets. In the category of sparse data sets, we focus our attention on the problem of transmembrane (TM) protein structure determination. As X-ray crystallography and NMR data is notoriously difficult to obtain for TM proteins, we develop a novel algorithm which uses low-resolution data from protein cross-linking or scanning mutagenesis studies to produce models of TM helix oligomers and show that our method produces models with an accuracy on par with X-ray crystallography or NMR for a test set of known TM proteins. Turning to instances of data abundance, we examine how to mine the vast stores of protein structural data in the Protein Data Bank (PDB) to aid in the design of proteins with novel binding properties. We show how the identification of an anion binding motif in an antibody structure allowed us to develop a phosphate binding module that can be used to produce novel antibodies to phosphorylated peptides -- creating antibodies to 7 novel phospho-peptides to illustrate the utility of our approach. We then describe a general strategy for designing binders to a target protein epitope based upon recapitulating protein interaction geometries which are over-represented in the PDB. We follow this by using data describing the transition probabilities of amino acids to develop a novel set of degenerate codons to create more efficient gene libraries. We conclude by describing a novel, real-time, all-atom structural search engine, giving researchers the ability to quickly search known protein structures for a motif of interest and providing a new interactive paradigm of protein design.

Keywords/Search Tags:

Data, Structural, Protein, Sets

Related items

1	Analysis of protein-protein interactions using multiple biological data sets
2	Graph-based analysis of protein-protein interaction data sets
3	Research On Classification Algorithm Of Typical Imbalanced Data Sets
4	Boundary Research And Data Filtering Based On Rough Sets
5	Research On Prediction Methods Of Protein-protein Interaction Sites
6	Precise and accurate structural genomics protein structure determination using RD and GFT NMR spectroscopy
7	The Study, Based On The Certainty Of S-rough Sets Decision Rules
8	Seismic imaging and migration velocity analysis of Alberta Foothills structural data sets
9	An Essential Protein Identification Method Based On Fusion Of Multiple Data Sources
10	P-sets And Its Applied Characteristics Research