Font Size: a A A

The Research And Implement Of Fast And Efficient Multi-pattern Matching Algorithm

Posted on:2018-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y JingFull Text:PDF
GTID:2348330518998938Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Pattern matching algorithm is one of the most classic research direction in computer science,and is widely used in many research areas like information retrieval,intrusion detection systems,virus detection,information filtering and computational biology.Multipattern matching covers the issue of finding all possible occurrences of multiple patterns inside textual content in single pass.In recent years,multi-pattern matching has become a hotspot of pattern matching.Wu-Manber algorithm(WM algorithm for short)is one of the most effective algorithms for Multi-pattern matching and has a flexibility to skip the unnecessary matching of text characters.However,the performance of Wu-Manber algorithm depends largely on three tables(PREFIX table,SHIFT table and HASH table)built in pre-processing stage.These three tables have big disadvantages and can be improved greatly.This paper improves WM algorithm.The main work is as follows:1.We propose a new algorithm namely the Adaptive Hash Wu Manber algorithm(AHWM for short)via improving PREFIX table,SHIFT table and HASH table.The basic improvement strategy of this algorithm includes three aspects.Firstly,the hash value of the prefix in PREFIX table is changed.By increasing the length of prefix from B bits to lsp bit,more mismatched patterns will be filtered.Secondly,two jump tables SHIFT1 and SHIFT2 will be built.The SHIFT2 table has the same function as that of the SHIFT table in WM algorithm.The SHIFT2 table is used to record the maximum safe distance that the sliding window can jump after the end of a match.Due to the SHIFT2 table,more characters that do not need to be compared can be skipped and the speed of pattern matching will be accelerated.Thirdly,the data structure of HASH table entries is changed from the linked list to the adaptive hash sub-structure HASH_STRUCT proposed in this paper.HASH_STRUCT can dynamically select an appropriate data structure to store the patterns based on the number of patterns in the current hash table slot.When the patterns set is very large,compared with linked list,HASH_STRUCT can reduce search time in the HASH table and ensure to take up as little memory as possible.2.We propose a new algorithm named the Efficient Parallel Multiple Patterns Matching(EPPM)algorithm which improves AHWM algorithm.Although the AHWM algorithm has a very good time performance,it is sensitive to the length of the shortest pattern in pattern set.When the length of the shortest pattern in pattern set is very small,the performance of the algorithm becomes worse.EPPM algorithm overcomes this shortcoming.According to the length of pattern in pattern set,EPPM algorithm divides the original pattern set into four subsets SET1,SET2,SET3,and SET4.For each subset,the most proper matching algorithm is used.The matching algorithms for different subsetis are independently executed.For SET1 and SET2,TABLE1 and TABLE2 are used for pattern matching.For SET3,AC algorithm is used for pattern matching.For SET4,AHWM algorithm is used for pattern matching.The pattern matching for four subsets is performed simultaneously.3.We conducte a number of experiments to demonstrate the validity of the AHWM algorithm and EPPM algorithm.
Keywords/Search Tags:multi-pattern matching, Wu-Manber algorithm, AHWM algorithm, EPPM algorithm
PDF Full Text Request
Related items