Font Size: a A A

Efficient Representation And Implementation Of Position Automata

Posted on:2019-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:B Y WangFull Text:PDF
GTID:2428330548959149Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The need for regular expression matching has always existed.The general scheme is to construct the corresponding nondeterministic finite automata(NFA).The mainstream nondeterministic finite automata NFA is usually divided into two categories.One is the Thompson architecture,proposed by k.Thompson in the 1960 s.The Thompson architecture has a simple idea,but there are a lot of redundant states,and empty transitions(e-transiton),A typical Thompson architecture includes about m to 2m states(where m is the length of the regular expression).There is also a Glushkov architecture.In contrast,the Glushkov architecture has the advantage that there is no empty transition and the number of states equals the length of the regular expression plus one.There is no empty transition to make Glushkov architecture transfer less than the Thompson architecture,the number of states equal to the length of the regular expression plus one,so that the state of the automaton and the regular expression can easily establish the relationship between the position,Therefore,the Glushkov automaton is also called a position automaton,and bit-parallelism can be used to achieve active state set transfer.In addition,similar to the Glushkov Follow Automaton there is no empty transition,Follow Automaton according to the state of the Follow set element composition of the Glushkov automaton partof the state merger,with fewer state numbers,so in theory have more than the Glushkov automaton Good performance.According to the above properties,these latter two automata are collectively considered as position automata.The main work of this article is divided into two parts:on the one hand improving the bit parallel matching method of Glushkov automaton.Due to the properties of the Glushkov state,this paper divides the state bits of the automaton into several categories for processing,and improves the Navarro-Raffinot method(NR-method)for the bit parallel matching method of general Glushkov automaton.In the NR-method,the update process of status bits has room for improvement.Use the "diffuse-function and extract-function" and "expanded bit-vector" methods to improve the process of updating the set of activation states at each step.Using diffuse and extract functions can complete a word length matching in a constant time,and the bit extension method can also implement bit parallel matching in a constant time for a general position conflict vector.On the other hand,compare the matching efficiency of Glushkov and Follow automata.For some special cases,the Follow Automaton has better performance than the Glushkov Automaton.But from the overall point of view,when the number of regular expression substrings is small,the upper bound of the state number of the states of the Follow Automaton is smaller than that of the Glushkov automaton,which means that the average performance of the Follow Automaton is better than that of the Glushkov automata at this time.With the number of regular expression substrings gradually increases,the upper bound of the state number means of the Follow Automaton is basically the same as the upper bound of the mean value of the Glushkov automata,which means the average performance of the two types of automata is the same at this time.
Keywords/Search Tags:Bit parallel, Regular expression matching, Position automata
PDF Full Text Request
Related items