Font Size: a A A

Computational studies characterizing the information encoded in protein structures and sequences

Posted on:2005-12-07Degree:Ph.DType:Thesis
University:University of California, San FranciscoCandidate:Aynechi, TibaFull Text:PDF
GTID:2450390008977752Subject:Biophysics
Abstract/Summary:
Proteins are often responsible for human diseases. Furthermore, their function and biological role is defined by their three dimensional structures. The field of therapeutic design has undergone major leaps in the past two decades, (1) due to our increased understanding of biological systems, (2) the availability of large amount of sequence data, and (3) the exponential growth of computing power coupled with advances structure based drug design techniques.; Experimental structure determination is time consuming and not practical for large-scale processing. Comparative modeling, which relies on sequence similarity, is often used to identify relatives of unknown proteins. This thesis begins by examining the role of distance constraints as an alternative metric for similarity when looking for fold relatives. However we find that in the absence of clear definitions for similarity an objective method can not be developed.; We then shift our focus to quantifying the information in distance constraints using information theory. We use sets of exhaustive lattice walks to develop numerical measures of the information content of sets of exact distance constraints applied to specific conformational ensembles. We examine the effects of experimental uncertainties by considering "noisy" constraints.; We extend the use of information theory and simplified models in the following two chapters to quantitatively analyze the protocols involved in comparative modeling. We begin by deriving the ideal costs of sequence alignments and gap penalties based on gap distributions using exhaustive sequence set with simplified alphabets. We show that there are different gap penalties for different alphabet sizes and that there can be dependencies on the length of the sequences being aligned. In addition we use two dimensional lattice models to quantify the relative resolving power of some commonly used force fields. We show that long-range intra-atomic interaction are the most informative.; The last chapter of this thesis is an investigation of charge models in calculations of free energies of binding. Through the use of a large test set, we show that optimization of parameters, specifically those involved in calculating the non polar contributions to the free energy, can significantly increase correlation of free energies with those obtained from experiment.
Keywords/Search Tags:Information, Sequence
Related items