Font Size: a A A

Research And Implementation Of Approximate String Matching Algorithm Supporting Swapping

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:B C LiFull Text:PDF
GTID:2428330575977300Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Because with the improvement of network equipment hardware and software technology and the increasing number of network users,data flow on the network is growing in an explosive trend.With the development of big data and those related technology,the algorithms for processing big data will face even more severe challenges.Pattern matching that regarded as the basis of many cutting-edge technologies is widely studied,String approximation matching is often used for information searching,security detection,Biological computing,etc and is the branch of pattern matching.Improving the performance of high-performance approximate pattern matching algorithms that solve big data application problems is a very valuable issue at present.This paper introduces a new type of approximate pattern matching problem—an approximate matching algorithm that allows the exchange of adjacent two characters in text,and proposes corresponding algorithms with theoretical analysis and experiments.First of all,this paper defines the concept of swap,and explains that the condition of swap.Swap only can occur between adjacent two characters that differ from each other.Then swap version of a string indicates that the string is converted into a string after several swap operations.Based on this,we propose an approximate matching algorithm that allows the swap of two adjacent characters in strings.The algorithm first uses the idea of filtering to divide the text string into two sets of text segments:Mismatched collection and Candidate text segment set.We improved the previous approximate string matching algorithm,extend the range of the algorithm to the case where additional swaps are allowed.Different from existed algorithms,this paper show the algorithm can be more suitable for the requirements of real-world application scenarios,and have a certain practicality.Additionally,in order to improve the matching efficiency,based on the above filtering ideas,we directly filter out matching sets that are unlikely to match.we apply the improved approximate matching algorithm to the candidate segment set that will find all approximate matching locations as efficiently and accurately as possible.The general theoretical time complexity of approximate string matching algorithm is often between(On~2)andO(mn).We prove that the time complexity of the algorithm proposed in this paper is(Om+kn)in general,As stated in the paper,the matching can be completed in less time complexity because the algorithm utilizes the characteristics of exact matching and high efficiency in design.The entire algorithm uses exact matching to search for candidate regions during most of the filtering process.Our algorithm uses exact matching to search for candidate regions during most of the filtering process,in general the time complexity of the algorithm is closer to exact matching.In order to verify algorithm performance,we used two different test data to make a comparison experiment.Use existing algorithms to compare with the algorithm proposed in this paper,Under the same conditions,the experiment shows that the performance of the proposed algorithm is far better than the previous algorithm under a certain small range of mistakes.However with the relaxation of the error limit,The algorithm will show a drawback,the execution efficiency of the algorithm will gradually decrease,this will be a question that needs to be explored after this paper.But in reality,the rate of text error is often lower than the experimental data used in the experiment.Therefore,the algorithm has higher matching efficiency in practical application scenarios.
Keywords/Search Tags:Algorithms, Exact-matching, Approximate-matching, Swap, Filter
PDF Full Text Request
Related items