Font Size: a A A

Bayesian pairwise sequence alignment algorithms

Posted on:2003-07-08Degree:Ph.DType:Dissertation
University:Rensselaer Polytechnic InstituteCandidate:Webb, Bobbie-Jo MaryFull Text:PDF
GTID:1468390011487076Subject:Biology
Abstract/Summary:
Pairwise sequence alignment is one of the most fundamental tools used in bioinformatics. Currently, the most popular methods are built on dynamic programming and heuristic methods. These algorithms yield a single alignment, which, albeit (sub)-optimal, can be strongly affected by the choice of parameters, i.e., the scoring matrix and gap penalties. Additionally, the raw scores obtained are not independent from the lengths of the two sequences being aligned, requiring a post-analysis conversion to assess the significance that a pair of sequences is related. These limitations can be overcome through the formulation of sequence alignment as a Bayesian inference problem. Two new Bayesian Algorithms for Local Sequence Alignments, BALSA and BALSA-k, have been developed that take into account the uncertainty associated with all unknown variables by incorporating in the forward algorithm a series of scoring matrices, gap parameters, and all possible alignments. The Bayesian formulation allows the joint and the marginal alignments, samples of alignments drawn from the posterior distribution, and the posterior probabilities of scoring matrices and gap parameters to be easily obtained. Furthermore, they automatically adjust for variations in sequence lengths, allowing the statistical significance of a pair of sequences to be calculated directly from either the BALSA or BALSA- k score. The comparison of BALSA and BALSA-k to the best performing dynamic programming algorithm, SSEARCH, was undertaken. On a well-characterized protein database, PDB40D-B, the two algorithms detected 19.8% and 19.6%, respectively, at a 1% EPQ while SSEARCH detected 18.4% at the same error rate. The application of BALSA-k to a well studied set of orthologous human-rodent gene-pairs resulted in 100% of experimentally defined binding-sites falling in the conserved region between the two species in comparison to 92.9% for the 'Bayes Block Aligner', the first developed Bayesian alignment algorithm.
Keywords/Search Tags:Alignment, Bayesian, Algorithm
Related items