Font Size: a A A

Research On Index Technology Of String Approximate Query

Posted on:2013-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:X TongFull Text:PDF
GTID:2268330392467991Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the growing popularity of the information society, String handlingplays a more important role in information systems containing more broadsignificance. On one hand, many new problems can be converted into a stringmanipulation problem with novel approach. On the other hand, the data qualityproblem makes the exact string query processing difficult. As a result, manyresearchers had paid more attention to approximate string query processing.An approximate search query on a collection of strings finds those strings inthe collection that are similar to a given query string, where similarity is definedusing a given similarity function. Approximate string matching brings sometechnical challenges including the definition of the metric function for the stringapproximate query processing, the establishment of the index structure, a largeamount of data processing, the introduction of the string weights and so on.This paper analyzes the existing string approximate query jobs (containsquery with a weighted value and without the weighted value). We found that thecurrent string approximate query index structures common have some weakness,which are mainly about index structures can not be updated, the query efficiencyis low, limited query types, the query string length is limited, only application tothe fixed threshold and so on. To solve these problems, this paper proposed twonew kinds of index structures Fgramtree and Weitree and we give some novelquery algorithms based on these structures. Overall, Fgramtree can make similarstrings targeted to the same node in order to speed up the query. Weitreecompletes a mixed query for the string and numerical value mainly used for theapproximate string query with weighted value. We have conducted experimentson real data sets to evaluate that our query algorithm and index structure aresuperior to the traditional ones.
Keywords/Search Tags:index, data quality, approximate string search, weighted value
PDF Full Text Request
Related items