Font Size: a A A

Research On Approximate String Matching Techniques Based On MC Index Structure

Posted on:2016-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:P JiangFull Text:PDF
GTID:2428330542957377Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,which lead to massive data.A large number of data is widely used,especially in the enterprise applications,which puts forward higher requirements about the ability of the enterprise data integration.The similar query is becoming a popular way obtaining the data with a certain condition in huge amounts of data.Based on the refine and filter framework,the similar query contains building index,similarity filter and verification.The most important is the time of filter and verification.Many scholars are focusing on these and putting forward their own solution.The thesis introduces the background of the research and the meaning of the research.By introducing the basic theory of knowledge,we can know the concept of string similarity query.Then,we introduce the related research result in recent years of similar string query.For the current mainstream string similarity query algorithm,we analyze its advantages and shortcomings.Aim at the efficiency problem of most of the existing string similar filtering algorithm which is based on fragmentation when the collection of string length difference is big.This thesis presents a algorithm which is based on a new data structure about the similar query search.In order to improve the overall performance of the similar string query algorithm to some extent and ensure the accuracy of the result.Firstly,we propose a index structure--MC-Substring,Then we improve the parallel algorithm for LCS problem to solving the MC pattern and build the index for the string database based on the new index structure;Secondly,we can filter the string with some filtering rules,strings which pass the filtering rules compose the candidate set;Thirdly,we can verify the candidate set using some certain measures;Lastly,we output the result.In this thesis,we measure the performance of the algorithm,including the time of verification and space,the experiment proves that the algorithm which is presented in this thesis can improve the efficiency of the similar query search to some extent.The most important is that the data pattern,MC-Substring,gives us the perspective of similar string search,which we can use the date mining technology to tackle the string search problem.
Keywords/Search Tags:Similarity Search, Filter Algorithm, Edit Distance, Feature Pattern
PDF Full Text Request
Related items