Font Size: a A A

Research On The Improved Algorithms For Genome-scale Alignment

Posted on:2010-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:L X DongFull Text:PDF
GTID:2178330332498586Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the number of organisms whose genomes have been completely sequenced increases rapidly each year, the genome-scale alignment is the new trend. The method using suffix tree to find the maximal unique match(MUM) between two genomes is the classical method when aligning genome-length sequences. Though both the time and space complexity of the construction of suffix trees and the algorithm that uses suffix trees to find MUMs between two genomes are linear, the space cost of suffix trees is still a very great problem. Besides that, the algorithm Smith-Waterman that is used to close gaps is less efficient when the number of gaps is large, for calculating a large scoring matrix is the bottleneck of the algorithm Smith-Waterman.First we use suffix array instead of suffix tree as the basic data structure to find MUMs. The methods based on suffix tree use suffix links to find MUMs, while there are not suffix links in suffix array. As a result, space complexity is effectively decreased. Besides that, we parallelize the calculation of scoring matrix by using multi-thread when we close gaps, different processors continuously select the elements not caculated of the scoring matrix, which accelerates the speed of closing gaps and makes better use of CPU, much time is reduced.
Keywords/Search Tags:Suffix array, Maximal unique match MUM, Smith-Waterman, Multi-thread
PDF Full Text Request
Related items