Font Size: a A A

Approximate String Matching And Optimizition Techniques Using Reversed Filter

Posted on:2014-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:T PeiFull Text:PDF
GTID:2268330425991796Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As computer science and technology is developing rapidly, and its application is used in life more and more. Because the string is a basic data structure, structures and algorithms deal with string has been widely studied and applied in recent years. When the information is input and transmitted, it is inevitable that there will be a lot of errors, which requires some approximate string processing algorithms. Be different from precise processing algorithms, there are many issues remain unresolved which is about approximation algorithms. The paper select a most common problem to study-approximate string matching problem. There are many existing approximate string matching algorithms, one of the most classical algorithm is count filtering algorithm which is based on grams.The main contribution of this paper is to propose a more capable filter algorithm than the count filter algorithm. The basic idea is different from count filtration, so this article called it reversed filter. The basic idea of reversed filter is that if the edit distance between two strings is not more than a certain value, then the edit distance between a substring of a string and another substring corresponding to this substring must be smaller than the value. Based on this basic idea, this article studies reversed filter systematically. Define MinEd which can be used as the basic unit of reversed Filter. Prove the relationship between MinEd and global edit distance. By the relationship of MinEd and global edit distance, draw an important conclusion-that is the lower bound of edit distance between two strings. Using this lower bound, we obtained string filtering algorithms.In order to improve the filter effect, based on MinEd definitions and theorems, we propose MinEd accumulate theorem. MinEd accumulate theorem can accumulate MinEds, greatly improve the reversed filter algorithm and supports a larger query threshold on edit distance. Based on MinEd, we propose PMinEd with position constraints which proved a better filter ability than MinEd. In order to break through in the limiting of edit distance in preprocessing stage, this paper presents a dynamic expansion in query stage, solves this problem. Ensure the practicability of the algorithm. This paper presents a new approximate string matching algorithm that uses the classic count filter and the proposed reverse filtering. In order to support two filter methods at the same time, we design reversed filter index and a location-based inverted index. By joining these two indexs, we can use reversed filter and count filter at the same time, but in most cases wo can get a very good efficiency by using the reversed filter only. Finally, the experiments demonstrate that the proposed algorithm is correct and efficient.
Keywords/Search Tags:edit distance, approximate matching, inverted filter, string, query optimization
PDF Full Text Request
Related items