Font Size: a A A

Optimized Regular Expression Matching Engine For Fast

Posted on:2016-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:J M WangFull Text:PDF
GTID:2298330467994924Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Network traffic classification has become one of many network relative tasks since the born of Internet, which has become centralized research gradually. Traffic classification and identification is the premise and foundation of network traffic project, intrusion detection and prevention, packet filtering, network design and planning and so on. Therefore, accurate and efficient traffic classification has an extremely important practical significance to analyze network development and to detect network abnormal behavior.This thesis mainly study fast traffic classification technology, including regular expression matching technology and big data traffic classification technology which is based on cloud computing platform. NFA(Non-deterministic Finite Automata) and DFA(Deterministic Finite Automata) are main matching technology for regular expression. NFA matches slowly, so people focus on faster DFA technology whose state transition time is O(1). But DFA brings space explosion problem, so space compression and optimization are needed. On the anther hand, with the rapid increasing of network traffic, we begin to enter the field of big data. Standalone classification have been overwhelmed and inefficient. New classification approach for large traffic is needed.Specifically, this thesis mainly focuses on the following two aspects:(1) Employ DFA space compression and state matching speed improvement technology to implement fast traffic classification. By studying the nature of DFA space explosion, we develop proper space compression algorithm and data structure which are used to compress the states and transitions. However, after compression, state query speed is not that faster than visiting traditional DFA state transition table, namely, matching speed will be slower. Therefore, we have two ways to improve the speed, improving the speed of successful matching and the speed of failed matching.(2)Study Hadoop-based network traffic classification. We use Hadoop Streaming technology to deploy traffic classification system on the Hadoop platform instead of single machine. More importantly, the problem that Hadoop doesn’t support the binary traffic’s input format itself is addressed successfully. Furthermore, we optimize some parameters by Hadoop tuning techniques, which makes us utilize the cloud platform’s parallelism technology better. The experiment states that the DFA optimization technology can achieve99%compression ratio and the matching speed is3to5faster than origin DFA. Furthermore, with the rapid increasing of traffic, there are more and more obvious advantages on Hadoop platform than single machine. These two technology bring large traffic’s fast classification and real classification a reference, which has certain application value.
Keywords/Search Tags:traffic classification, regular expression, signature match, DFA match, Hadoop technology
PDF Full Text Request
Related items