Font Size: a A A

Heterogeneous State Transition Method Based On Improved Glushkov Automata

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HongFull Text:PDF
GTID:2518306761459324Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Regular expressions have the characteristics of flexibility and efficiency.At the same time,regular expressions have strong expressive ability and are widely used in many fields.Non-deterministic finite automata is an important method to realize the application of regular expressions.Glushkov automata is a non-deterministic finite automata different from Thompson automata.Glushkov automata has two major advantages:firstly,the number of Glushkov automata states is equal to m+1(where m is the length of the regular expression non-operator);secondly,all inputs to each state of Glushkov automata are the same character.The combination of Glushkov automata and bit-parallel method is the most commonly used regular expression matching method,such as Navarro-Raffinot method(NR method for short)and state transition method according to the state classification of Glushkov automata(classification state transition method for short).Space consumption is one of the key factors affecting the performance of pattern matching methods.The above methods will cause a sharp increase in space consumption when the length of the regular expression increases,thus affecting the space-time performance of the pattern matching method.Therefore,reducing the space overhead is important for the pattern matching method.Based on this background,this paper proposes to combine the equivalent states of Glushkov automata to obtain an improved Glushkov automata,and apply it to the heterogeneous state transition method proposed in this paper.The main contributions are as follows:Based on the left equivalence relation and the rough right equivalence relation,the improved Glushkov automata reduces the number of states from m+1 to m1 by merging the equivalent states in the Glushkov automata,and reduces the number of out-state(the number of out-state in the Glushkov automata is equal to the number of regular expression substrings plus one)from k+1 to k1.In this paper,the feasibility of applying the improved Glushkov automaton to the bit-parallel method is proved for the first time,and the application of the NR method on the improved Glushkov automaton reduces the space overhead from O(m2m)bits of the NR method to O(m12m1)bits.Based on the improved Glushkov automaton and the classification state transition method,in this paper,new instructions PDEP and PEXT from the BMI2 instruction set are introduced for the first time to realize the high-performance processing of the improved Glushkov automata and a heterogeneous state transition method based on improved Glushkov automata(heterogeneous state transition method for short)is proposed.This method reduces the worst-case space overhead during pattern matching from O(m2m)bits and O(m2k)bits for NR method and classification state transition method to O(k2k1)bits,the average space overhead is,k(1+1/?)k1 bitsExperiments show that the heterogeneous state transition method proposed in this paper can fully achieve the expected matching requirements,the space overhead is reduced to 9.3%-55.5%of the classification state transition method,and the matching speed is also greatly improved due to effective scale compression.This means that the heterogeneous state transition method proposed in this paper solves the problem of high space overhead and provides a new idea for pattern matching methods.
Keywords/Search Tags:Regular expressions, Glushkov's nondeterministic finite automata, Navarro-Raffinot method
PDF Full Text Request
Related items