Font Size: a A A

Statistical Models And Algorithms For Aligning Multiple Sequences

Posted on:2008-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:L X ChengFull Text:PDF
GTID:2178360212474586Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Multiple sequence alignment and phylogenetic analysis are important study fields in bioinformatics. We can predict the structure and function of new sequence, and analyze the homologous relationship by aligning sequences and constructing phylogenetic tree. Improving alignment accuracy and reorganizing rational phylogenetic tree are main study courses. Concerning the two problems of multiple sequence alignment and phylogenetic analysis, some researches are made in this dissertation. The main work is summarized as follows:Refer to CluatalW and T-Coffee, integrate the virtue of progressive and consistency, a new progressive multiple alignment algorithm HMMPC is developed. Pair-HMM computes the posterior probability of two positions from two sequences respectively, and then combines the information from other sequences to get the final posterior probability, which is used to alignment progressively. In order to test the accuracy of the algorithm, HMMPC is tested and compared with ClustalW, T-Coffee and MUSCLE by using the BAIiBASE database for evaluating the effect of multiple sequence alignment algorithms. The results of testing indicate that the accuracy of HMMPC alignment is higher more than ClustalW and MUSCLE in practical time.Comparison between two sequences is the base of biological sequences analysis in bioinformatics. A new method SimKMM based on information theory is introduced in this dissertation. This method describes sequence by the distributing of subsequence, and calculates the evolution distance based on the principle of Kullback-Leibler divergence. It is simple, quick, objective and effective. We measure six DNA sequences, and database search with the method proposed in this paper. The results of two experiments validate this algorithm is practicable and more effective than other common methods to the sequences that are many cross match segments.
Keywords/Search Tags:sequence alignment, alignment-free, consistency, HMM, phylogenetic tree
PDF Full Text Request
Related items