Font Size: a A A

Research Of Finding Maximal Unique Matches In Genome

Posted on:2010-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2178330332998586Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Maximal unique matches play an important role in the gene sequence alignment. It can reconstruct the complete DNA sequence from a series of overlapping gene fragments;can determine the physical storage and the gene-map storge from probe data under a variety of test conditions;can estimate the similarity of two or more sequences from traversing and comparing the DNA sequence in the database..Firstly, some existing domestic and international research algorithms are briefly introduced and an algorithm of finding and sorting Maximal Unique Matches based on suffix array is provided in this thesis.Secondly, algorithm constructs a suffix array of the two sequences.Then, algorithm gets the value of the Longest Common Prefixes through the comparison of the two suffixes which are adjacent in the suffix array. Further more, algorithm derives MUM by scanning MUMs meeting the conditions of the LCP value. Finally, algorithm uses Longest Increasing Subsequences algorithm to obtain the sorted result of the MUMs. The experimental result shows that it's much ascendant than algorithm based on suffix tree in space under the same condition, Experiment shows that finding and sorting MUMs based on suffix array can save 28% of the space.
Keywords/Search Tags:Suffix tree, Suffix array, MUM, LIS
PDF Full Text Request
Related items