Font Size: a A A

Segmenting protein and DNA sequences using dynamic Bayesian networks

Posted on:2010-04-10Degree:Ph.DType:Thesis
University:University of WashingtonCandidate:Reynolds, Sheila MooreFull Text:PDF
GTID:2440390002979435Subject:Biology
Abstract/Summary:
Hidden Markov models (HMMs) have been widely used in computational biology for over 20 years with early applications ranging from gene- and motif-finding in DNA sequences to the prediction of protein secondary structure. In this thesis we address two important sequence segmentation tasks using novel probabilistic models from the class of dynamic Bayesian networks (DBNs), which represent a generalization of the HMM. In particular we exploit the notion of "virtual evidence" in our DBNs to define flexible constraints and to incorporate arbitrary evidence tracks.;Sequence segmentation is the task of partitioning an input sequence into a set of (generally) variable-length, non-overlapping regions in which each region is assigned a single label chosen from a set of pre-defined labels.;The first segmentation task we address is that of segmenting a membrane-spanning protein ac cording to the topological relationship of each segment to a cellular membrane. The challenges in this task include partially labeled data, one-to-many relationships between labels and states in the model, and the goal of finding the most likely sequence of labels rather than the most likely sequence of states. Membrane-spanning segments are frequently identifiable simply by the hydrophobic nature of the constituent amino acids and this strong hydrophobicity signal is exploited by our model.;The second segmentation task that we address is related to the complex three-dimensional combination of DNA and protein called chromatin. At the first level of the chromatin structure, the repeat element is called a nucleosome, with neighboring nucleosomes connected by short, variable-length segments of "linker" DNA. Contrary to the strong membrane-topology signal that exists at the amino acid level in the first task, whether there even exists a nucleosome-forming signal in the DNA sequence is still a subject of ongoing research. We have developed a novel discriminative classification approach to scoring the nucleosome-forming potential of a short DNA segment, and we use these scores in the context of a DBN to predict nucleosome positions along a chromosome.;We hope that these two demonstrations of the power of DBNs will spur their use in diverse areas of the field of computational biology.
Keywords/Search Tags:DNA, Sequence, Protein
Related items