Font Size: a A A

High performance computational biology algorithms

Posted on:2011-08-05Degree:Ph.DType:Dissertation
University:University of Illinois at ChicagoCandidate:Saeed, FahadFull Text:PDF
GTID:1448390002466758Subject:Biology
Abstract/Summary:
Multiple Sequence s Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing number of sequences. On the other hand, with the advent of new breed of fast sequencing techniques it is now possible to generate thousands of sequences very quickly. For rapid sequence analysis, it is therefore desirable to develop fast MSA algorithms that scale well with the increase in the dataset size. In this dissertation, we propose a novel domain decomposition based technique to solve the multiple sequence alignment problem on multiprocessing platforms. The domain decomposition based technique, in addition to yielding better quality, gives enormous advantage in terms of execution time and memory requirements. The proposed strategy allows to decrease the time complexity of any known heuristic of O(N)x complexity by a factor of O(1/ p)x, where N is the number of sequences, x depends on the underlying heuristic approach, and p is the number of processing nodes. In particular, we propose a highly scalable algorithm, Sample-Align-D, for aligning biological sequences using Muscle system as the underlying heuristic. In this dissertation, we also develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align large number of reads from single or multiple reference genomes obtained from pyrosequencing procedure. The proposed alignment algorithm accurately aligns the erroneous reads in a short period of time. The proposed algorithms have been implemented on a cluster of workstations using MPI library. We report high quality multiple alignment of up to 0.5 million reads with our analysis suggesting that up to 10 million or more reads can be aligned using our parallel algorithm. The algorithms are shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.
Keywords/Search Tags:Algorithm, Highly scalable, MSA, Sequences, Problem, Alignment
Related items