Font Size: a A A

Biological Sequence The Algorithm Kalign's Research Analysis

Posted on:2009-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y H PuFull Text:PDF
GTID:2208360245961179Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequence alignment is the most common fundamental subject in modern bioinformatics. As biological databases growing exponentially, it is posed higher request for the sensitivity and efficiency of multiple alignment algorithms. The research of fast and sensitive biology sequence alignment algorithm is a current hot topic of bioinformatics. Concerning the problem of multiple sequence alignment, some researches are made in this dissertation. The main work is summarized as follows:Firstly, we describe the basic problem about sequence alignment, gap penalty, substitution matrix and standard of assessing alignment result and so on. Secondly, we mainly study and describe the algorithm ClustalW, T-Coffee, Muscle, which is based on the progressive alignment atrategy. Then through the analysis of these algorithms, we improve the multiple sequence alignments algorithms Kalign based on the progressive algorithm strategy.Kalign is a often used method employing the Wu-Manber approximate string matching algorithm, to improve both the accuracy and speed of multiple sequence alignment and it is especially well suited for the task of aligning large numbers of sequences or divergent sequences. However, the alignment quality is not high on account of inaccurate estimate of the distances between sequences. In this paper, Kalign's----an algorithm is introduced to refine the alignment created by Kalign. Firstly, we calculate the distance of pairwise sequence with the new method according to the alignment coming from Kalign, and then a new guide tree, which dictates the order of pairwise alignments, is built from a matrix of pairwise distances between all sequences, using the UPGMA method. Finally, a new alignment is produced by a progressive alignment method. The above steps are repeated until convergence or until a user defined limit of iteration is reached. We use the BAliBASE 3.0 alignment benchmark set for the assessment of our method. The result shows that out algorithm achieve more accurate alignment quality than Kalign does.
Keywords/Search Tags:multiple sequence alignment, Kalign's, Distance measures, iterative alignment
PDF Full Text Request
Related items