Font Size: a A A

Study On Efficiency Of Single Pattern Matching Algorithms

Posted on:2014-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:G H PanFull Text:PDF
GTID:2268330401977057Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a basic problem in computer science, string matching is widely used in many fields. Such as Text Editor, Intrusion detection technology, virus signature matching technology and the Firewall technology, Gene sequence comparison.Sunday is a widely used and effective algorithm nowadays. But when the pattern strings begin with numbers of repeated characters which also exist in text at the same time, the times of attempt matching would increase as rapidly as the number of characters that repeated. Thus, Sunday could not be as efficient as it can, even could not be more efficient than KMP.the reason of this is Sunday used the character of text following the matching window to decide the position to move. The advantage of this is manifest; it could enhance the match speed much more than other algorithms. Whereas the disadvantage is in the condition above, Sunday may be the most inefficient algorithm. Thus, to extend the scope of Sunday that can be applied, and enhance its efficiency, a new algorithm based on Sunday algorithm is proposed. Shorten the number of repeated characters in the start position of pattern string to one before starting to compare. Then the compressed string would be used to compare with the text, if succeed, return beginning position of the fixed child string, or else reach the end of text, return false.after that, find the difference of the compressed string and the original string as n. then loop n times and compare the characters before the returned position in the text, once they are not equal, stop loop and continue to compare the compressed string with the text, or else return success.this can reduce the times of attempt match and enhance the matching speed. The performance of this improved algorithm is analyzed. The experimental results prove the effectiveness of it.Since it is complicated to compute the state translation diagram of string matching algorithms constructed by Markova chain directly, a new method according to the difference of compared numbers in the algorithms was proposed to compute the average efficiency. Firstly choose the BF algorithm as the foundation of other algorithms. After elaborate analyzed the matching process described by a state translation diagram of Markova chains, we finally get a simplified state diagram without absorbing states, the transition probability of each state would limit to a stable mumble when n is large enough. According to the characters above, a formulation which can compute the average efficiency can be promoted.Given the characteristic that Sunday algorithm is the representative of BM classes, the newest modified algorithms of BM, the upgrade version of BM, widely used in practice. So we choose it as the second algorithm to be analyzed. The matching process of Sunday algorithm is very complicated for heuristic jump by bad characters; therefor Markova chain is difficult to be constructed directly. Given this, a new method according to the difference of compared numbers in the algorithms was proposed to compute the average efficiency, analyzing the distinction of the two algorithms. We divided the difference of these two algorithms in two parts by the next character following the matching window of the text belongs to the pattern or not. Compute each part by the corresponding knowledge in Probability theory. The two results are combined to get the equation which represents the average efficiency of Sunday algorithm. To ensure the size of text and alphabet is rigorous the same, a success match can be obtained in the end with no doubt, we change the position of the characters in text to form the new text to be compared to, and looped to do so.until none of new compared times produced, delete data in the result invalid. The experimental results show that the estimated value computed by the equation is the average number of the compared times indeed.
Keywords/Search Tags:Sunday, algorithm efficiency, Markova chain, BF, averagecompared numbers
PDF Full Text Request
Related items