Font Size: a A A

Research On Multiple Sequence Alignment Algorithms In Bioinformatics

Posted on:2006-12-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:1118360155458203Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The accumulation of biology sequence data has offered a bright future to life sciences research, but also a severe challenge to data processing. It is the main goal of bioinformatics how to mine valuable biology information from the vast biology sequence data, to understanding the structure, function and evolution of genes and protein, to cognition ourselves in molecule level and to benefit human being ultimately. Concerning the two problems of multiple sequence alignment and phylogenetic analysis, some researches are made in this dissertation. The main work is summarized as follows:In this paper, we describe existing multiple alignment algorithms, such as ClustalW, T-Coffee, DiAlign, Prrp, MultAlin and Muscle, and expose the potential strengths and weaknesses of the most widely used multiple alignment packages.ClustalW is a most widely used multiple sequence alignment program. Considering its accuracy is lower to distantly related sequences, refer to MultAlign, a new iteratively progressive multiple alignment algorithm IPMSA is developed. In order to test the accuracy of the algorithm, IPMSA is tested and compared with ClustalW and MultAlign by using the BAliBASE database of multiple sequence alignment. The results of testing indicate that the accuracy of IPMSA alignment is 3.1% and 19.6% more than ClustalW and MultAlign respectively.Considering the distance matrix based on pair-wise alignment cannot objectively and effectively calculate the evolution distance, a new FDOD method based on information theory is introduced. This method describes sequence by the distributing of subsequence, and calculates the evolution distance by information discrepancy. It is simple, quick, objective and effective. Furthermore, the time complexity of two methods are O(N~2L~2) and O(N~2L) respectively.The FDOD method is introduced the multiple sequence alignment algorithm research for the first time, and a new multiple sequence alignment algorithm MSAID based on a measure of information discrepancy and IPMSA is developed. MSAID has two portions: MSAID-1 and MSAID. MSAID-1 and MSAID are tested and compared with other prior methods by using reference alignments of BAliBASE. For the alignments with no large N/C-terminal extension or internal insertions MSAID received the top overall average.The phylogenetic analysis of genome is one of bioinformatics research fields. It is hard to rebuild phylogenetic tree based on multiple sequence alignment because of the...
Keywords/Search Tags:Bioinformatics, multiple sequence alignment, progressive alignment algorithm, iterative alignment strategy, phylogenetic tree, FDOD function
PDF Full Text Request
Related items