Font Size: a A A

Approximate String Matching Algorithms And Optimizition Techniques Using Local Filtering

Posted on:2015-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2308330482452491Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As computer science develops rapidly, many applications are widely used in the society. Meanwhile, the technology of computer science brings much convenience to human beings. The string is a basic and important data structure, so it is being used to store large amount of data. How to retrieve the matching strings from a large string collection is still a big research problem.Especially string similarity matching has many critical significance and technical challenge. We explore approximate queries with edit distance constraints and devise efficient matching algorithms.We first summarize many typical filtering and trie-based pruning techniques used in existing methods and analyze these techniques which adopt an idea that two similar strings must have some common substrings. Based on this, we propose a new filtering technique, called local filtering whose main idea is that two dissimilar strings mush be composed of many dissimilar substrings. We need to discover these dissimilar substrings when doing the filtering. Then we formally define local distance and use it to measure the local difference between two strings. According to the definition of local distance, we propose an accumulation theorem to do the filtering. In order to improve the filtering effect, we propose a local filtering with positional constraint which considers positional information in the filtering. According to the local filtering, a new index structure BitTree is devised to calculate local distances efficiently. Further, local distance can also be used in the verifying step. Due to the large space of BitTree, we propose two other index structures, PBitTree and CoreBitmap, to reduce space and devise estimation methods to filter strings.Final experimental results on three real data sets show that the performance of local filtering is more efficient than the existing methods in the dimension of running time, pruning power and index structure size. Furthermore, the proposed estimation methods also achieve an efficient performance.
Keywords/Search Tags:edit distance, approximate matching, string, estimation method, local filtering
PDF Full Text Request
Related items