Font Size: a A A

Algorithms To Speeding Up Multiple Regular Expressions Matching For Application Traffic Classification

Posted on:2012-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LuoFull Text:PDF
GTID:2178330332993107Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Deep packet inspection (short for DPI) is one of the basic approach and key technologies of safeguarding the network information, the DPI technology of inspecting packet deeply on line becomes one of hot topics, in order to gear to backbone and core network speed needs. The application layer protocol identification, which aim to identify the application layer protocol is the network transmission data, is the based and key technology of Deep packet inspection. However, the existing DPI theory and algorithm assumed that the detected network flow is static and random data, which cannot adapt the real changing network flow. Low speed and large storage space cause the contradiction and problem between speed and performance in DPI system, and also is the basic reason that hard to implement the high speed on line. In this paper, the main research is about the application-layer protocol identification with multiple regular expression in DPI. From the point of the real network flow properties and application-layer protocol distribution, this paper studied real network environment factors affecting the DPI detection speeds. The main contributions of this paper are as follows:Firstly, we uses the actual network traffics and reassembles the traffics based on protocol of TCP, then our datasets use these actual reassembled TCP flow instead of random and static data used by many traditional researched method. Network traffic flows have been studied and flow characteristic has been analyzed. We find that the network flow distribution is uneven, and the protocol probability distribution is similar to the Eighty-twenty rule. The achievement is the foundation and basis for proposing the novel grouping algorithm based on probability.Secondly, a novel grouping algorithm based on the probability of regular expressions was proposed, and compare scanning time and DFA-scheduling times with no-grouping algorithm. In this algorithm we also studied the performance index include grouped number, total state number, and scanning time by setting different grouping thresholds. More significantly, we carried out a serious of contrast experiments, using our algorithm and the heuristic algorithm, proposed by Fangyu, which has great contribution to the RE grouping algorithm. And Test results show that our algorithm has better effects. Especially, when the detected data flow has higher rate of identification, our method can improve efficiency several times.Lastly, proposed adaptive method of regular expression grouping based on the structure of splay Tree, in order to auto adapt the dynamic changing network traffics. It employs the splay Tree instead of traditional a linked list to schedule the automation (this paper uses DFA) built by the regular expression. The splay algorithm inserts the automation into a complete binary tree, and ensures the frequent items have priority hit by moving the hit node near to the root node.Experimental results show that Splay Tree method not only improves the rate of application-level protocol identification, but also auto adapts the dynamic changing traffic flows.
Keywords/Search Tags:Protocol identification, regular expression, Deterministic Finite Automation (DFA), Splay Tree, adaptive
PDF Full Text Request
Related items