Font Size: a A A

Computational genome analysis by alignment

Posted on:2009-12-28Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Yang, YiFull Text:PDF
GTID:2448390002990978Subject:Biology
Abstract/Summary:
In the first part of this thesis, we describe a global multiple alignment program that aligns large­scale genomic sequences. Such tools are important for analyzing biological sequences. Heuristic algorithms with progressive alignment approach have previously been applied. Our approach considers matches among multiple sequences simultaneously. It adopts the idea previously used in sequencing projects and transforms the multiple alignment problem into a shortest path problem in a de Bruijn graph which can be solved in linear time. The multiple alignment is represented by a directed acyclic graph which provides a platform for further comparative studies. We have tested our method on a CFTR region consisting of 13 species with 1Mbp average size, and accurate results are obtained in time comparable to progressive alignment algorithms.;In the second part of this thesis, we describe a filtration algorithm that detects significantly overlapped optical maps among millions of optical maps in a short time. Optical mapping is a powerful technology that allows construction of ordered restriction maps. Research problems like whole genome restriction map assembly and comparison need method to quickly explore massive data to identify significant matchings without doing dynamic programming comparison. Our filtration method is based on a restriction fragments tuple matching scheme and dynamic programming. We demonstrate the capability of the program on several simulation datasets and a whole human mole data. Our method shows a good balance between the speed and accuracy.
Keywords/Search Tags:Alignment, Method
Related items