Research On Approximate String Matching Techniques Based On MC Index Structure

Posted on:2016-07-09

Degree:Master

Type:Thesis

Country:China

Candidate:P Jiang

Full Text:PDF

GTID:2428330542957377

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,which lead to massive data.A large number of data is widely used,especially in the enterprise applications,which puts forward higher requirements about the ability of the enterprise data integration.The similar query is becoming a popular way obtaining the data with a certain condition in huge amounts of data.Based on the refine and filter framework,the similar query contains building index,similarity filter and verification.The most important is the time of filter and verification.Many scholars are focusing on these and putting forward their own solution.The thesis introduces the background of the research and the meaning of the research.By introducing the basic theory of knowledge,we can know the concept of string similarity query.Then,we introduce the related research result in recent years of similar string query.For the current mainstream string similarity query algorithm,we analyze its advantages and shortcomings.Aim at the efficiency problem of most of the existing string similar filtering algorithm which is based on fragmentation when the collection of string length difference is big.This thesis presents a algorithm which is based on a new data structure about the similar query search.In order to improve the overall performance of the similar string query algorithm to some extent and ensure the accuracy of the result.Firstly,we propose a index structure--MC-Substring,Then we improve the parallel algorithm for LCS problem to solving the MC pattern and build the index for the string database based on the new index structure;Secondly,we can filter the string with some filtering rules,strings which pass the filtering rules compose the candidate set;Thirdly,we can verify the candidate set using some certain measures;Lastly,we output the result.In this thesis,we measure the performance of the algorithm,including the time of verification and space,the experiment proves that the algorithm which is presented in this thesis can improve the efficiency of the similar query search to some extent.The most important is that the data pattern,MC-Substring,gives us the perspective of similar string search,which we can use the date mining technology to tackle the string search problem.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Top-k String Similarity Search Based On Edit Distance
2	The String Pattern Matching Algorithm Based On Edit Distance
3	Research On Graph Search Problem Based On Edit Distance
4	Improved Edit Distance Algorithm And Its Application In E-government
5	Research On Similarity Search Technique For Big Data
6	Research On String Similarity Search Algorithms
7	Research On Graph Similarity Queries With Edit Distance Constrains
8	Similarity Search On Heterogeneous Information Networks
9	Research On Similarity Query Over Sequence Data
10	An Algorithm Of Computing String Similarity Based On Improved Levenshtein Distance