Approximate Pattern Matching With Gap Constraint Under The Edit Distance

Posted on:2015-08-20

Degree:Master

Type:Thesis

Country:China

Candidate:S Ren

Full Text:PDF

GTID:2298330452494223

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The problem of pattern matching is the oldest and most widely studied in computerscience, with the increasing scale of the text that people need to deal, and the search in textis become more complex, the approximate pattern matching that allowed some slightdifferences between text and mode has been widely used. But the research in connectionwith the problem of approximate pattern matching with gap constraint has not yet mature,so this thesis opens up the research and implementation contrary to this issue.The problem was divided into two parts, approximate string matching problem and thepattern matching with gap constraint. In aspect of approximate string matching, firstly thepaper introduces some basic string matching related knowledge, including the concept ofstring matching as well as some traditional string matching methods. Then studies themethods of approximate string matching, including dynamic programming, automata, bitparallel and text filtering. Through researching and comparing four methods, the paper willuse dynamic programming as the primary method for approximate string matching, and useedit distance model as the measure of string similarity model, programming of theestablishment of the editing distance matrix as well as the calculation of string edit distance.In aspect of pattern matching with gap constraints, the text string is saved as plain text, weindexed each string in the text that same as the way Lucene inverted indexed plain textdocuments. This index contains the string position and the number of occurrences, on thebasis of the index, and we used SpanNearQuery in Lucene to search the pattern. Becausewhen the text is too large, common ordered method will cost a lot of time, so we abandon it,and it brings us to a very high matching efficiency.Finally, we compare the paper’s algorithm and common order algorithm, under thepremise of guaranteed the same text, the same pattern and the same result, after a lot ofexperiments and analysis of the experimental results, we can see that when the text is largeand not change frequently, the algorithm in the paper consumes a reasonable space inexchange for the absolute reduction in time, it’s better than ordinary sequential algorithm.

Keywords/Search Tags:

approximate match, gap constraint, edit distance, Lucene, invertedindex

PDF Full Text Request

Related items

1	Efficient Approximate String Matching With Edit Distance Constraint
2	Approximate Longest Common Substring Matching And Optimization Techniques With Edit Distance Constraint
3	Sequentially Matching Similarity String Algorithm Research
4	Approximate String Matching And Optimizition Techniques Using Edit Distance
5	Study On Fast Algorithms For Edit Distance
6	Research On The Solving Technology Of Graph Edit Distance
7	Research On Approximate String Matching And Its Application On URL Detection
8	Research Of Methods Of Data Cleaning For Hotel Entity Based On Edit Distance And Conditional Functional Dependencies
9	Approximate String Matching And Optimizition Techniques Using Reversed Filter
10	Parallel Approximate String Matching Algorithm Based On Gpu