Font Size: a A A

A New Theory Of Automata Construction(PFA)

Posted on:2017-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:M H JingFull Text:PDF
GTID:1368330572965446Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Regular expressions are widely used in many fields because of its flexibility and high efficiency in pattern matching.In all of the pattern matching applications based on regular expression,the regular expressions are needed to convert into the equivalent finite automata,and to implement them.The theory and algorithms which are used to transform the given regular expressions into equivalent finite automata are called automata constructing.Usually from two aspects to evaluate the effectiveness of constructing algorithm:one is the size of the automaton,the smaller is better;the other is the complexity of the converting algorithm,the lower is better.This thesis presents a new automata construction theory and names it PFA(Postfix Finite Automata).PFA contains the basic definition,concept,various theorem and laws to ensure the completeness and correctness of the theory.It provides the proof of the correctness and efficiency of the theorem and laws and analyzes the computational complexity of PFA construction algorithms.A new set of methods and algorithms of utilizing PFA regular structure engine are proposed.This thesis also presents a optimization algorithm for the convertion of NFA into DFA.The simulation in FPGA hardware regular engine based on PFA is implemented.Theoretical computer science is the foundation of computer science.And formal linguistics and automata theory are very significant in theoretical computer science.In particular,finite automata and regular expressions are widely used in natural language understanding,pattern matching,exchange theory,automatic control,compiler construction.DNA sequence alignment,optical character recognition,data compression and encryption,communication protocol analysis and so on.Finite automata theory mainly focuses on regular grammars and regular sets,the properties of finite automaton and its decision problems,the equivalence between finite automaton and regular expressions,and the determination of finite automaton,and the minimal processing,and so on.The equivalence here refers to the fact that they are signifying or recognizing the same formal languages.The related study and achievements are instructional for the application of regular expression and finite automata in various fields.Regular expression even replaces traditional exact string matching with its high flexibility and pattern matching technique.The pattern matching uses regular expression to describe the pattern bank and inspects by constructing the equivalent finite automata in practice.The finite automata to run the pattern matching is called Regular Engine.The construction of finite automata is the process of converting regular expressions into equivalent finite automata.The low efficiency of regular engine is an urgent problem to be solved in the regular expression and the poor automaton technology and application.Many domestic and foreign experts and scholars are studying how to improve the matching efficiency of the regular engine.Approaches such as rule rewriting,group matching,state merging,and hybrid automata,have been proposed to improve the matching efficiency of large-scale regular expression.Among these approaches,starting from the basic concept and theory of finite automat,effectively reducing the size of finite automata to enhance the efficiency of the regular engine is a way to fundamentally resolve the regex engine bottleneck problem.Starting from the basic theory.this thesis presents a new set of automata construction theory and algorithm to fundamentally resolve efficiency problem that various regular expression based matching technologies and applications face in practice.This thesis first presents the basic concepts and definitions of Postfix Finite Automata,PFA.It then proposes a complete set of theorems,including smallest size theorem,meta,symbol,impotency law,and absorption law.These theorems guarantee the closure property,regular property of PFA's homomorphy arithmetic,therefore the theorems correctness and completeness are proved.This thesis also demonstrates the correctness and effectiveness of the algorithm and comparatively analyzes of the computational complexity of the algorithm.This thesis slightly studies the application of the PFA theorem in the field of network security,however,regular epression and finite atomata technologies have been widely used in various fields.The PFA theorem and its algorithms and approaches are applicable to any regular expression and Finite Automata application fields.Based on the fnite automata's construction algorithm and theory,this thesis launches a series of study by focusing on how to adopt a simpler approach to build smaller size of finite automata to fundamentally simplify the regular engine implementation and therefore improve regular engine's matching efficiency.Our contribution can be summarized as follows.1)Present a new automata construction theory,PFA.PFA converts regular expression to postfix expression,based which to construct parse tree and finally come up with a series algorithms of uncertain finite automata on the basis of tree traversal.2)Propose an optimizing coding algorithm using postfix parse tree in order to reduce the scale of automata effectively.The algorithm can effectively reduce the number of states of automaton under the homomorphism and closure,and realizes the identity of the state.Although the state identification is simple and appears to be independent of the state merging and spatial compression,it can be directly realized by the combination of the equivalent state and the elimination of redundant switching arcs.The difficulty of the construction algorithm of automaton affects the implementation of regular engine in many systems such as IDS,IPS,DPI,etc.To make the algorithm itself simple and easy to implement,the thesis proposes an algorithm based on the suffix encoding tree times to construct a smaller size of the finite state automaton.This algorithm is less complex,and able to construct much smaller automata than widely used Thompson algorithm and Glushkov algorithm.3)Use hybrid regular engine by compromising the advantages and disadvantages of NFA engine and DFA engine to increase system's efficiency is a common and effective method.PFA can construct NFA with a very simple algorithm.The sub-expression,recognition,and process of automata can simply and effectivly implement the construction of NFA/DFA hybrid engine.Meanwhile,it can construct smaller scale automata based on sub-expression,which effectively increase hybrid engine's efficiency.4)The complexity of deterministic algorithm of converting NFA into DFA will affect the matching effectiveness of DFA and NFA/DFA hybrid engine.An optimized NFA deterministic algorithm was put forward.and it outperforms traditional approch with easier realization and less complexity.5)Hardware has the nature of concurrency and high speed.The regular engine based on hardware implementation is still capable of matching,performance with high throughput rate.Therefore FPGA based hardware design has become a hot research topic in recent years.To apply PFA theory and algorithm proposed in this thesis into the design of FPGA NFA engine,it can use the same FPGA resource to realize more matching of regular expressions and consequently increase FPGA NFA engine's performance.Testbench result shows this method can realize the expected matching requirement,and greatly improves the amount of regular expressions and its matching speed through effectively reducing its scale.
Keywords/Search Tags:Finite automata, regular expression, construction algorithms, postfix automata, FPGA
PDF Full Text Request
Related items