Font Size: a A A

Study Of Several Algorithms For Alignment Problem Of Sequence And Sequence Secondary Structure

Posted on:2016-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2180330473457179Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In bioinformatics, sequence alignment and sequence secondary structure alignment are the most foundational problem. Since the 1970 when Needleman and Wunsch proposed classic dynamic programming algorithm, how to get more accurate results and higher time and space efficiency of the sequence alignment algorithm and sequence scendary structure alignment algorithm have been an important topic in bioinformatics. Considering higher similarity in actual sequence alignment problem, that is to say number of spaces to insert in each sequence of the best alignment will not be too great, the upper limit number of spaces in this thesis will be considered as the limit to design sequence alignment algorithm, so as to reduce the complexity of the algorithm running time and space.Dynamic programming algorithm proposed by Needleman and Wunsch runs in O(m ?n) time and space, where m and n are the length of two sequences respectively, which still is the most practical and effective algorithm for pairwise sequence alignment. In this thesis, we propose a novel algorithm based on dynamic algorithm of Needleman and Wunsch for pairwise sequences alignment, which runs in O k ?min?,( m n?) time and space, where k is a parameter that means the maximal number of allowed inserted spaces. We introduce the method into Hirschberg linear space algorithm, and proposed a novel linear space algorithm of pairwise sequences alignment under the maximal number of allowed inserted spaces k. Its time complexity is O k ?min?,( m n?), and its space complexity is linear.A large number of actual data show that the number of inserted spaces is within 20% of sequence length for most pairwise sequence alignment. Therefore, in the experiment, when the upper limit is 20% of sequence length, the new algorithm can reduce the nearly 50% of the running time for most cases guaranteed to find the optimal solution. Running time can even reduce nearly 85% for the cases that similarity of sequences is higher.And then the method is extended to three sequence alignment algorithm under the upper limit number of allowed inserted spaces k, which runs in 2O(k ?n) space and time, where k is a parameter that means the maximal number of allowed inserted spaces and n is length of three sequence.Due to the increase of the number sequence in Multiple Sequence Alignment problem, complexity of algorithm increased sharply. The actual efficient algorithms for multiple sequence alignment can’t be acquired even if the concept of k is introduced. Therefore, a fast heuristic algorithm based pairwise sequence alignment for multiple sequence alignment presented by this thesis. Experiments based on the BAliBASE database also verify the accuracy of our algorithm is better than previous best algorithms.The sequence secondary structure alignment algorithm proposed by Bafna introduced by this thesis, which runs in 3 2 2O(mn ?m n) time and space, where m、n is the length of the pairwise sequence. Based on the sequence secondary structure alignment algorithm proposed by Bafna, a novel secondary structure alignment algorithm under the maximal number of allowed inserted spaces 1k presented this thesis, which runs in 2 2 21 1O(k mn ?k n) time and space, where m、n is the length of the pairwise sequence.Finally, we summarizes the research results of sequence alignment and sequence secondary structure alignment problem introdueced k of the concept, and proposes the future research direction for the relevant work and some ways to solve the possible problems.
Keywords/Search Tags:sequence alignment, multiple sequence alignment, dynamic programming, divide and conquer algorithm
PDF Full Text Request
Related items