Font Size: a A A

Multiple Sequences Alignment Based On A-Star And DiAlign Algorithms

Posted on:2008-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y FangFull Text:PDF
GTID:2178360212474587Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Sequence alignment is the most common fundamental subject in modern bioinformatics. Through sequence alignment, we can predict the structure and function of new sequences, analysis the evolutionary linkage of sequences, and do phylogenetic analysis. Firstly, we describe the basic problem about sequence alignment, such as gap penalty, substitution matrix and the standard of assessing alignment result. Secondly, two methods, A* algorithm and Diagonal Alignment algorithm, are introduced in this paper.In theory, the Needleman–Wunsch algorithm can be extended quite easily to produce score optimal alignments of more than two sequences. The complexity of the algorithm grows exponentially with the number of sequences, the algorithm is prohibited to extend to more than three sequences. A* algorithm restricts the search space of the multiple alignment problem by using reasonable heuristics. In this way, as many as eight protein sequences, each containing several hundred amino acid residues, have been aligned simultaneously. First, MSA problem is transformed to the shortest path problem, then we uses A* algorithm, which can find efficiently the shortest path and decrease the time complexity. This paper applies the A* algorithm to multiple sequence alignment problem with more powerful estimator and upper bound, mostly decrease the search space. Try my best to improve the performance of program and algorithm: finding closer bound value and using Hash list instead of simple link list. At the last, we compare our algorithm with several other methods in testing examples. The result shows that program after setting bound value and using Hash list is several times more efficient than before. Speaking in theory, the result should be very perfect if we select appropriate Objective Function and Substitution Matrix. However, the real result is different from the ideal situation, heuristic function and Objective Function expect to be improved. But the potential of the algorithm is great. Once Objective Function and Substitution Matrix acquire breakthrough, the future is bright.In this paper, we introduce another algorithm for sequence alignment named Diagonal Alignment which differs from other algorithms. Instead of comparing individual residues, we use gap-free segment-to-segment alignments with variable segment length. In particular, our algorithm does not employ any gap penalties. We propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. In the stage of finding diagonals, we try to use new method. Namely the fact that we should align two...
Keywords/Search Tags:Bioinformatics, Multiple Sequence Alignment, A-Star algorithm, Heuristic function, Diagonal Alignment
PDF Full Text Request
Related items