Font Size: a A A

Study On Fast Algorithms For Edit Distance

Posted on:2012-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:P P WangFull Text:PDF
GTID:2298330467478637Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
String matching,as one of the most classical and valuable research subjects in Computer Science, has been applied to many fields. Usually, edit distance is the standard method to measure the matching similarity of two strings, So it has become a classical subject in the fields of information theory and computer science research.Edit distance is also called Levenshtein Distance, which is introduced by Russian scientist Vladimir Levenshtein in1965.At present, the most frequently used calculation method for edit distance is called dynamic programming algorithm. The time complexity of dynamic programming algorithm is O(mn), with very high time cost level.Considering this, this thesis take the futher research and improvement on the edit distance algorithm.The main contents of this thesis include the following:firstly, it mentions definition of the approximate string matching, and introduces related techniques and main research methods; in addition, the thesis makes research on sequence similarity based on FFT(Fast Fourier Transform). this thesis will make study and improvements on better calculation of edit distance with the idea of fast fourier transform and linear convolution,proposed a new FFT-based string-distance-function FFT-D,whice can filter the data set and save computing time. By filtering the data set, to reduce unnecessary to the edit distance calculation, thereby increasing the efficiency of string matching..In addition, basing on these, a new method of linear-convolution-based edit distance algorithm LC-ED is introduced. This algorithm’s most important characteristic is that the insert operating and delete operating are replaced by inserting a blank space.Though two strings compare in differet alignment situation,find out the number and location where insert the space,in order to achieve rapid calculation of edit distance.This algorithm performs well both in theory and in practice, and has been approved by many tests as a good method to efficiently solve fast calculation of edit distance.
Keywords/Search Tags:edit distance, string match, sequence similarity, fast fourier transform, dynamicprogramming
PDF Full Text Request
Related items