Font Size: a A A

Approximate String Matching Based On Sufifx Array

Posted on:2013-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:2248330395955364Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
String matching is one of the basic problems in computer science. A lot ofresearches were focusing on exact string matching in early time, and many algorithmsof single and multiple string matching have been proposed. However, in applicationssuch as information retrieval, pattern recognition and computational biology,approximate string matching makes more sense. So there is so much theoretical valueand practical meaning to research and design efficient approximate string matchingalgorithms.In the beginning, this paper introduces the research status and basic theories ofapproximate string matching, and then presents the main approaches which includedynamic programming, automata, bit-parallelism and filtering. Then we come up withour approximate string matching over suffix array algorithm which is short for IS_DPalgorithm. IS_DP algorithm uses suffix array for accelerating the computation along thedynamic programming table and reaches running time in O(kn). Suffix array are usedfor preprocessing the sequences allowing an O(1) running time computation of thelongest common extension between substrings. Compared with the classic approximatestring matching algorithm which is based on dynamic programming, our algorithmshows pretty good time complexity which is O(kn); besides, compared with someapproximate string matching algorithms which are based on suffix tree, IS_DPalgorithm does improve the actual space usage.
Keywords/Search Tags:dynamic programming, suffix array, edit distance, approximatestring matching, longest common prefix
PDF Full Text Request
Related items