Font Size: a A A

Multiple Sequence Alignment Algorithm

Posted on:2008-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:S QiuFull Text:PDF
GTID:2178360212996107Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the implement of Human Genome Project, a large amount of new data about molecular biology have been obtained. It is a very arduous task which is about how to capture the valuable knowledge from it. So bioinformatics has become a new subject and developed increasingly. Sequence alignment is a basic information processing method in bioinformatics and useful for discovering functional, structural, and evolutionary information in biological sequences. Sequence alignment is to find the comparability of these sequences by using mathematical models or algorithm. The results of sequence alignment reflect the comparability and the biologic profile of these sequences. So it becomes a very important research field in bioinformatics which is about how to design a reasonable and effect sequence alignment algorithm. In this paper, we research the multiple sequence alignment algorithm in detail.Multiple sequence alignment is one of alignment algorithm for more than two sequences which have system evolutionary relationship. The research for multiple sequence alignment algorithm is processing. There are several methods in this research field in foreign countries and now we focus on Hidden Markov Model, which has already got good results in bioinformatics. Hidden Markov Model is a statistical model, which is very well suited for many tasks in molecular biology, although it has been mostly developed for speech recognition before. Because of its strong statistical foundation and effective training methods, it can be used in many kinds of problems.Hidden Markov Model includes two kinds of random processes. One is finite markov model inside, the other is a set of random functions. Each function has a relationship with a status. Hidden Markov Model changes its status according to diversion probability matrix. Because observers can only see the export data, and can not see the status of markov model inside, it is called Hidden Markov Model. An integrate HMM needs two parameters, called N M, and three probability distributions, called A B II. We use (N, M, A, B, II) to express HMM for short. Given a particular set of parameterized models, two questions can be answered: For a given observed sequence, which model is the most likely to explain this data, and for a given sequence and a given model, what is the most likely reconstruction of the path through the states. In addition, models can be learned from the data, in which parameters are estimated by expectation-maximization techniques. It is very natural to use a Bayesian statistical framework with HMMs. For biological sequences, the time dimension is replaced by the position in the sequence. Hidden Markov models prove so successful in this field because they can naturally accommodate variable-length models of regions of sequence.A profile HMM is a certain type of Hidden Markov Model with a structure that in a natural way allows position dependent gap penalties. A profile HMM can be obtained from a multiple alignment and can be used for searching a database for other members of the family in the alignment very much like standard profiles that shows above. In order to build a Hidden Markov Model, we should confirm the number of each status based on the average length of training protein sequence. Then, choose training methods to modify parameters of models. The training of Hidden Markov Model is to modify the parameter inside, so the model can be used to train all the training sequence and can bring out the training observed sequence mostly. A trained model can represent the protein sequences which have the same characteristics.Hidden Markov model used in multiple alignment is a new field of bioinformatics. It can be used to distinguish protein sequence with the same characteristics. But the problem is that the two parameters estimation algorithms for HMMs, the Viterbi algorithm and Baum-Welch algorithm, are only guaranteed to find a local optimum. So a novel method based on genetic annealing evolution is designed in this paper. The genetic algorithm has the advantage of optimization in the overall situation. And, the simulated annealing algorithm can be capable of optimizing in part. So the combined algorithm was brought forward: simulated annealing algorithm to improve genetic algorithm. Experimental results show that this algorithm is more superior to those traditional methods in finding a global optimum.
Keywords/Search Tags:Alignment
PDF Full Text Request
Related items