Font Size: a A A

Sequence Alignment Based On The Codon Substitution Matrix

Posted on:2007-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ChenFull Text:PDF
GTID:2178360242461841Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Sequence alignment has become an essential tool in bioinformatics. It has traditionally been used to compare more than two protein sequence or nucleotide sequence, analyse their similarity and homologous, find conserved regions and characteristic motifs in protein families, discover sequence patterns which generate common functions.Sequence alignment includes pairwise sequence alignment and multiple sequence alignment. Multiple sequence alignment is a typical NP-hard problem. Today there are many multiple sequence alignment algorithms and programs. They try to get the best result in acceptable time and space consume.In fact, substitution matrices play an important role in sequence alignment. The result of sequence alignment depends on veracity of substitution matrix directly. At present the substitution matrices used in biology sequence alignment are either amino acid substitution matrices or nucleotide substitution matrices. It is well known that gene mutation occurs at the nucleotide level which is caused by base mutation, nature choice occurs at the amino acid level because protein structure and function decide whether the mutation is accepted by nature. Constructing substitution matrices only based on amino acid or nucleotide, the loss of inheritance information is very possible.Codons possesse the information of nucleotides and corresponding amino acids. So the codon can be used as the basic evolution unit to inspect DNA sequence substitution rules, so as to build a codon substitution matrix, which can be used in the algorithm of sequence alignment to improve the correctness of sequence alignment. To assess the performance of the new codon substitution matrix in multiple sequence alignment, C-Muscle is constructed to accept codon score matrix and applied as the sequence alignment program. BAliBASE, an international official database, is used as the benchmark dataset. The experimental results demonstrate the superiority of C-Muscle which used the proposed codon substitute matrix comparing with Muscle, one of the best multiple sequence alignment programs.
Keywords/Search Tags:Multiple Sequence Alignment, Codon, Substitution Matrix, Bioinformatics
PDF Full Text Request
Related items