Font Size: a A A

Probabilistic computational methods for structural alignment of RNA sequences

Posted on:2011-01-23Degree:Ph.DType:Thesis
University:University of RochesterCandidate:Harmanci, Arif OzgunFull Text:PDF
GTID:2448390002967459Subject:Engineering
Abstract/Summary:
In this thesis, the problem of structural alignment of homologous RNA sequences is addressed. The structural alignment of a given set of RNA sequences is a secondary structure for each sequence, such that the structures are similar to each other, and a sequence alignment between the sequences that is conforming with the secondary structures. A solution to this problem was proposed by Sankoff as a dynamic programming algorithm whose time and memory complexities are polynomial in the length of shortest sequence and exponential in the number of input sequences, respectively. Variants of Sankoff's method employ constraints that reduce the computation by restricting the allowed alignments or structures. In the first part of the thesis, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed and these probabilities are additively combined to obtain probabilities of co-incidence. The constraints on alignments are computed by adaptively thresholding these probabilities to determine co-incidence constraints for pruning of computations that hold with high probability. The proposed constraints are implemented into Dynalign, a free energy minimization algorithm for structural alignment. Compared with prior non-adaptive approaches, the probabilistic constraints offer a significant reduction in computation time along with a marginal increase in base pair prediction accuracy.;Next, a novel algorithm for structural alignment of two RNA sequences is presented. The similarity of the structures is imposed by matched helical regions in the structural alignnment. A matched helical region represents a conserved helix in each structure. Compared to the structural alignment models of previous methods, the matched helical regions extend the possibilities for alignment of paired nucleotides, which enables the new structural alignment space to better accommodate the structural variability within a sequence family.;A probability distribution over the space of structural alignments is proposed based on pseudo-free energy changes, that account for both stability of structures and plausibility of the sequence alignment. Three different problems are addressed for structural alignment prediction as inferences from the distribution: 1. Estimation of the maximum a posteriori (MAP) structural alignment, i.e., the structural alignment with highest probability in the space, 2. Computation of base pairing probabilities of nucleotides in each sequence, 3. Sampling the structural alignment space for analysis of modes of the distribution over structural alignments.;Finally, the problem of structure prediction for an arbitrary number of homologous sequences is addressed. To circumvent the computational complexity, the prediction is broken down into an iterative computation of base pairing probabilities for each sequence using extrinsic information for base pairing derived from other sequences. For a sequence, extrinsic information is generated by using base pairing probabilities for other sequences from the previous iteration and wrapping them on the sequence in consideration via the co-incidence probabilities for alignment between the sequences. The algorithm iteratively updates the base pairing probabilities and extrinsic information for each sequence and the final set of base pairing probabilities are utilized for prediction of structures. Compared to other multi-sequence structure prediction methods, the proposed method has higher or comparable accuracy of structure prediction and time and memory requirements scale favorably with increasing number of input sequences as compared to joint alignment and folding algorithms.
Keywords/Search Tags:Alignment, Sequences, Base pairing probabilities, Computation, Methods, Compared, Algorithm
Related items