Font Size: a A A

Research On Technique Of Application-Layer Protocol Identification Based On Regular Expressions

Posted on:2009-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:2178360278956993Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The accurate identification of the protocols in network is the base for bandwidth management, intrusion detection, firewall, and content auditing. Traditionally the ptotocol identification was realized through mapping mechanism based on port. However, its accuracy could not meet the demands of application because an increasing number of protocols applied the dynamic ports. This thesis conducts an in-depth research on protocol recognition methods and selects the method based on the regular expressions according to the practical needs.Regular expressions, the patterns for the application data of the protocols, are employed to inspect the packets and distinguish the protocol types. The main contributions of the thesis are as follows.(1) A comparative analysis between NFA matching engine and DFA matching engine is performed, and the regular expression based on DFA is adopted.(2) A three-division compression method is proposed because of the expansion of the memory spaces in the DFA transition tables. By dividing the transition table into three tables, the commom elements in every row and every column in each table were compressed. The testing results of the regular expressions in L7-filter indicate that the compression ratio is above 95%.(3) A new grouping method is proposed based on the initial characters of the strings which matched the regular expressions. This method reduces the memory size of DFA and does not impact the matching speed of the packets. The testing results of the 13 normal protocols reveal that the memory size is only 24.2% as that before the grouping.(4) Taking the accuracy, throughput, and flexibility of the protocol identification system into account, it adopts a software-hardware combined framework in the system. Through this framework, not only the patterns can be compiled automatically and updated fast, but also the packets can be matched in high speed.(5) A protocol identification system based on regular expressions is realized through a gigabit network card. The testing results show that the system can identify the protocols online when they are on the gigabit network.
Keywords/Search Tags:protocol identification, regular expression, Deterministic Finite Automaton(DFA), transition table compressing
PDF Full Text Request
Related items