Font Size: a A A

Finding MUMs With Enhanced Suffix Arrays

Posted on:2008-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:H T GuoFull Text:PDF
GTID:2178360212474585Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since the first successful whole-genome shotgun sequence of Haemophilus influenzae, the number of organisms whose genomes have been completely sequenced has been increasing rapidly each year. When the genome sequence of two closely related organisms becomes available, one of the first questions researchers want to ask is how the two genomes align and annotate. For this, traditional sequence alignment algorithm has been incompetent for this task. So many new approaches for aligning genome-length sequences have been presented.MUMmer is one of the most prominent systems which are designed to align whole genome sequences, in which the anchor-based method and the algorithm using suffix trees to find potential anchors for an alignment have been adopted by more and more genome-scale alignment systems. So at the beginning of this paper we first carry on analysis to the algorithms, whole structure and system evolution on MUMmer, which provided a reference for the improvement of the system and putting forward new anchor-based approaches to be used to align genome-length sequences.Considering the limit of the computable resource available and the increase of the scale of sequences to be handled, although both the time and space complexity of the construction of suffix trees and the algorithm which uses suffix trees to find the maximal unique match (MUM) between two genomes are linear, the space cost of suffix trees is still a very great problem. So we decide to use enhanced suffix array to find MUMs between two genomes instead of suffix tree. In this paper, we implement two algorithms based on enhanced suffix array, which respectively use the properties of the enhanced suffix array and simulate the manner of traversal on suffix trees. Comparing to the corresponding method based on suffix tree, they share the same time complexity, and larger spaces are saved. In addition, during the process of using enhanced suffix array to simulate streaming match algorithm based on suffix tree, we present a new linear algorithm for incorporating suffix links into suffix array and give a provement to its correctness.
Keywords/Search Tags:suffix tree, enhanced suffix array, maximal unique match MUM, suffix link, anchor-based sequences alignment approach
PDF Full Text Request
Related items