Font Size: a A A

TCAM-based Fast And Scalable Regular Expression Matching

Posted on:2014-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:K Y PengFull Text:PDF
GTID:1228330398972870Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Regular expression matching is the core engine of many network functions such as intrusion detection, protocol analysis and so on. In practice, regular expression match-ing is performed using either deterministic finite automaton (DFA) or non-deterministic finite automaton (NFA), each having its own merits and drawbacks on matching speed and memory space requirement. NFAs are compact in memory, growing linearly with regular expression pattern size. However, the matching speed of an NFA can be unpre-dictably slow. In contrast, DFA has fast and predictable matching speed. To saturate escalating demand for speed, DFA has been the prevalent choice of high speed systems. However, The speed of DFA comes at the cost of excessive storage space; the number of DFA states grows exponentially with the size of regular expression patterns. In spite of intensive research, we are still in need of a method for fast and scalable regular expres-sion matching, where it takes one simple memory lookup to match each input character (like DFA) and its storage space grows linearly with regular expression pattern set size (like NFA). Most recently, TCAM(ternary content addressable memory)-based DFA implementation has been proposed as a promising approach, for TCAM’s unique paral-lel and wildcard matching capabilities. However, the number of TCAM entries needed is still above exponentially growing DFA size and hence not scalable.In this paper, we propose a TCAM-based DFA deflation method for fast and scal-able regular expression matching, which takes one simple TCAM lookup to match each input character and effectively deflates DFA size. By exploiting the structural connec-tion between NFA and DFA enables us to identify DFA states sharing the same NFA origin; these DFA states are actually DFA replications of the same NFA state, and can be effectively merged using proper DFA state encoding scheme and TCAM entry merg-ing scheme. Experiments based on real life pattern sets demonstrate that, the number of TCAM entries used by our DFA deflation method is up to two orders of magnitude lower than the DFA size.We then present the first method for implementing NFA using ternary content ad-dressable memory (TCAM). By exploiting the fact that most NFA states will not be active simultaneously, we partition all NFA states into a number of groups and propose an efficient scheme for encoding NFA active state set. Through proper TCAM encod-ing, our TCAM-based NFA approach matches each input byte with one single TCAM lookup-operating at precisely the same speed as DFA, while being close to NFA in size. Extensive experiments based on real life pattern sets demonstrate that our TCAM-based NFA implementation method can reduce storage space and boost matching speed by an order of magnitude, compared with existing TCAM-based DFA implementation methods.All DFA-based approaches need to construct a DFA from an NFA as a intermediate step, thus the DFA construction process can be one of system bottlenecks. In this paper, by exploring the inherent properties of finite automaton-how are the NFA states active simultaneously and how the self-looping NFA states lead to explosion of DFA state space, we design a state subset encoding and searching scheme, and propose a new DFA construction algorithm. Through experiments based on real life pattern sets, we demonstrate that our new DFA construction algorithm dramatically reduces the running time by88.33%~93.57%than the standard subset construction algorithm.
Keywords/Search Tags:NFA, DFA, TCAM, regular expression matching, deep packet inspection
PDF Full Text Request
Related items