Font Size: a A A

Research On Regular Expression Grouping Based On Simulated Annealing Algorithm

Posted on:2020-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q MaoFull Text:PDF
GTID:2428330599954615Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The traditional deep packet detection technology detect computer viruses by the precise string description and this method is very inefficient and can not be applied to the Internet data flow.Regular expressions have the characteristics of simplicity,high efficiency and strong expression ability,and it is especially suitable for application in deep packet detection.In practice,the characteristic that multiple regular expressions can be combined to generate a DFA engine can improve the matching efficiency.But it will case "state explosion" when multiple regular expressions are parsed into a DFA engine.In the worst case,it will cause the DFA state number to grow exponentially and it will case the problem that DFA engine can not be generated in the ordinary hardware platform in theory.Research shows that a reasonable grouping of regular expressions is an effective way to solve this problem.We can divide regular expressions which cause "state explosion" into different groups and divide other regular expressions into the same group.The goal of grouping is to obtain the least total state number of DFA of every group,the least number of groups and the minimum standard deviation of each group state number in the shortest time.However,the existing grouping algorithms are inefficient,have large standard deviation of each group state,and have large sum of DFA states,so the grouping results are not ideal.It is found that most regular expression grouping algorithms need to build the DFA engine continuously to determine whether there is a "state explosion",which leads to the long grouping time of the algorithm.In the second chapter,in order to improve the grouping efficiency,this paper improves the prediction formula of the regular expression DFA state number and proposes a grouping algorithm based on the estimate expansion rate(GRE-EER).It guides the grouping according to the estimated expansion rate between regular expressions and doesn't need to build DFA engine.In order to obtain the accurate grouping results,a grouping algorithm based on the real expansion rate of regular expressions(GRE-RER)is proposed.GRE-RER guides the grouping according to the real expansion rate between regular expressions.It can obtain accurate grouping results.In order to minimize DFA storage space,A local optimization algorithm is added.In practice,we combine GRE-EER,GRE-RER with local optimization algorithm to propose GRE-EER-RER strategy in order to get a good grouping result.In order to make the algorithm have global search ability,we combine simulated annealing algorithm and GRE-EER-RER to propose a regular expression grouping algorithm based on simulated annealing algorithm(GRE-SAA)in the third chapter.The simulated annealing algorithm search the optimal solution in the global scope and generate new solutions,and these new solutions will be grouped by GRE-EER firstly.And then they will be grouped by GRE-RER when the grouping results of GRE-EER meet certain conditions.In order to make it suitable for large-scale rule set,the GRE-SAA is improved according to the characteristics of large-scale rule set.Experimental results show that GRE-SAA have strong grouping ability on small-scale rule set,medium scale rule set and large-scale rule set,and it can get better result in the DFA engine state number,grouping time and the standard deviation of DFA state number than other global search algorithms.In order to further improve the convergence ability and search ability of the algorithm,a regular expression grouping algorithm based on genetic simulation annealing algorithm(GRE-GASA)which combine simulated annealing algorithm,genetic algorithm and GRE-EER-RER is proposed.We design experiments to compare it with GRE-SAA,and the results show that for small-scale set and medium scale set,the convergence ability and grouping results of the GRE-GASA is better than GRE-SAA,but the efficiency of GRE-GASA is less than GRE-SAA.So it is recommended to use GRE-GASA where group efficiency requirements are not high,otherwise it is recommended to use GRE-SAA.Finally,we improve it and design experiments in order to make the GRE-GASA suitable for large-scale rule set.The experimental results show that grouping result of GRE-GASA and GRE-SAA are similar and they are better than Becchi algorithm for large-scale rule set.
Keywords/Search Tags:regular expression, deep packet detection, simulated annealing algorithm, grouping algorithm, genetic simulated annealing algorithm
PDF Full Text Request
Related items