Font Size: a A A

Multiple Sequence Alignment Based On FFT And K-mer Dividing

Posted on:2008-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:F X JiangFull Text:PDF
GTID:2178360212974586Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Sequence alignment is the most common fundamental subject in modern bioinformatics. As biological databases growing exponentially, it is posed higher request for the sensitivity and efficiency of multiple alignment algorithms. The research of fast and sensitive biology sequence alignment algorithm is a current hot topic of bioinformatics. We present a novel approach called KMD_MSA, which drastically reduces the time complexity without sacrificing the accuracy.Firstly, we describe the basic problem about sequence alignment, gap penalty, substitution matrix and standard of assessing alignment result and so on. Secondly, we mainly study and describe the algorithm ClustalW which is based on the progressive alignment strategy. Then through the analysis of these algorithms, we improve the multiple sequence alignments algorithms based on the progressive algorithm strategy.The divide-and-conquer are applied in MAFFT earliest, which reduced the CPU time from O(L2) to O(LlgL) by the application of FFT in alignment. In this report, combining the divide-and-conquer approach, k-mer is applied to the sequences alignment, a novel multiple sequence alignment program, KMD_MSA, has been developed. Homologous regions are rapidly identified; the original alignment problem is divided into some sub-problems to solve. And it reduces the time of identified homologous segments from O(LlgL) to O(L). In this study, we combine two techniques, k-mer lookup and compressed alphabets, to achieve reduction in the time and space complexity of pairwise alignment. The performance of KMD_MSA is compared with the most popular methods by benchmark tests. The test shows that the time of KMD_MSA is drastically reduced as compared with other methods with comparable accuracy. These algorithms are also proved feasible and efficient in biology sensitivity and computing efficiency.
Keywords/Search Tags:sequence alignment, MAFFT, divide-and-conquer, FFT k-mer, KMD_MSA
PDF Full Text Request
Related items