Font Size: a A A

Research On Regular Expression Matching Technology In Deep Packet Inspection

Posted on:2019-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:C C XuFull Text:PDF
GTID:1368330623450422Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Modern network services increasingly rely on the deep packet inspection(DPI)technology to perform the identification on the packet payload,such as the application protocol identification,network intrusion detection,application-based bandwidth management,etc.The DPI technology matches the predefined pattern set with the packet payload,to judge whether the payload matches some pattern(s)in the set.Due to the powerful and flexible expressive ability of the regular expression,it is widely used for pattern description in the DPI based applications.Regular expression matching(REM)is usually implemented through the finite state automaton,and the increasing pattern scale has brought great challenges to the compilation of the pattern set,deployment of the automaton and the matching performance.On the other hand,the increasing link speed of backbone Internet also puts higher requirements on the performance of DPI.To address these issues,this dissertation studies on three aspects,namely the hardware architecture,the grouping algorithm and the automaton design.The dissertation starts from a comprehensive survey on the regular expression matching technology,we systematically introduce the application background,technical principles,software solutions and hardware solutions of the regular expression matching in DPI.Combined with a practical hybrid automaton,we devise a novel hybrid architecture based on FPGA+Multicore platforms for this automaton.By modeling and analyzing the rule grouping problem of the REM,we propose a compact and efficient heuristic grouping algorithm and the corresponding optimization algorithm.Motivated by the source of the automaton state explosion,we devise a novel automaton named Offset-FA to solve the state explosion problem,and the proposed Offset-FA achieves a satisfactory balance of the space requirement and matching efficiency.The contributions of this thesis could be summarized as follows:1)This dissertation provides a detailed survey on the regular expression matching techniques in the deep packet inspection applications.Firstly,the application background of DPI and the common methods of DPI are introduced.Subsequently,the technical background of regular expression matching is introduced,and it is pointed out that state explosion is the main challenge for regular expression matching.The causes of the state explosion are discussed from two aspects,namely the features of the patterns and semantic relationships among states.Further,current solutions for regular expression matching are roughly divided into the software-based automaton optimization methods and the hardware-based parallel acceleration methods,and each kind of methods are further classified into fine-grained categories.The optimizations of automaton are mainly divided into the automaton compression solutions and the scalable automatons,which includes the rule grouping,semideterministic automatons,and the automatons with additional labels and operations.Parallel acceleration is mainly based on the following platforms: FPGA,GPU,general purpose multi-core CPU and TCAM.The dissertation also provides a detailed comparison of various software solutions and hardware solutions.Finally,based on the above comparisons,we put forward specific guidelines for constructing efficient DPI systems under different application background.2)The hybrid-FA[1]is a highly practical automaton,which is also reflected in some industrial products.However,the hybrid automaton is faced with the problem of the large head automaton and the low-efficiency tail automatons.The large head automaton prevents it from being deployed in high-speed but low-capacity on-chip memories,while the inefficient processing of the tail automatons would severely impact the overall matching performance.This dissertation presents a storagecentric FPGA + Multi-core hybrid architecture.The matching engines of this architecture consist of two parts,namely the hardware engines on the FPGA part and the software engines on the multi-core processor part.The hardware engines and the software engines work cooperatively in a pipelined manner.The payload of a packet is sent to the hardware engine for processing first.When the tail automaton is activated,the remaining unprocessed payload part is handed over to the software engines for further processing.The pipelined approach can isolate the processing of the inefficient tail automaton,thus to avoid its impact on the overall performance.On the other hand,the hardware engines adopt the two-level storage architecture of on-chip RAMs and off-chip DDR3 SDRAM,where the frequently accessed head automaton states are configured on the on-chip RAMs to realize the high-speed matching of the hardware engine part.Simulation experiments show that hybrid-FA can achieve high-speed matching on the FPGA+Multi-core matching architecture.3)Rule grouping is a natural way to effectively avoid the state explosion problem.Currently,the common grouping algorithms can not balance the time cost of grouping and the grouping result.With proper modeling and analysis,the rule grouping problem can be reduced to the maximum k cut graph partitioning problem.This dissertation explores the applicability of simulated annealing algorithm and genetic algorithm for the rule grouping problem.At the same time,a compact and efficient heuristic algorithm named one-step greedy(OSG)algorithm and the corresponding optimization algorithm named the heuristic initialization(HI)algorithm are proposed.The experimental results show that the OSG algorithm can obtain the grouping effect comparable to the current optimal grouping result under the premise of greatly reduced grouping time.4)In order to solve the state explosion problem from the root,this dissertation targets on the characteristics of the rules,and points out that the features of character set with repeated restrictions(LCSR)are the main reason leading to state explosion.A new automaton named the Offset-FA is proposed to solve the state explosion problem.According to the positions where these LCSR features appear,the original rules are cut into multiple fragments,and then these fragments are compiled into a deterministic finite automaton(DFA).In addition,a fragment relation table and a reset table are generated to record the relationships among these fragments,meanwhile maintaining the semantic equivalence with the original rule set.After removing the LCSR features,the state explosion problem is handled effectively.In addition,we further improve the matching efficiency of the Offset-FA with a series of optimizations on its compilation algorithm and matching algorithm.Experimental results show that the Offset-FA achieves a superb balance of the space cost and matching efficiency,and it solves the state explosion problem while achieving better matching efficiency than current solutions.
Keywords/Search Tags:Deep packet inspection, Regular expression matching, State explosion, Architecture, Rule grouping
PDF Full Text Request
Related items