Font Size: a A A

Performance Optimization Of The Multi-Pattern Matching Algorithm For The Large-scale URL Keywords

Posted on:2012-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2218330362950412Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
String matching algorithm has always been a research focus in computer science. Higher requirements have been put forward for the performance of string matching algorithms due to the explosive growth of data. In information security area, the large-scale of keywords, real-time demanding and complex matching data make the huge memory consumption of large-scale URL keywords matching algorithms the performance bottlenecks in the information security system like intrusion detection.In this paper, several string matching algorithms are studied. In particular, the advantages and disadvantages of the classical algorithms are specially analyzed and compared. After deeply studying the large-scale URL keywords lengths features and the matching requirements, it is summarized that long length URL keywords are more, short keywords are less and the keywords with'and'expression requests have a small scale. A high performance algorithm is proposed called PMUC(Multi-pattern Matching Algorithm for URL Based on Classification) for the larger-scale URL keywords matching according to these characteristics. The purpose of performance optimization can be achieved by classifying the URL keywords into different kinds to combine the advantages of AC algorithm and Wu-Mamber algorithm. The classical AC algorithm and the Wu-Mamber algorithm are also been improved, the keywords which have short length and have the demand of matching with'and'expression use the GFAM algorithm which is the improving AC for matching, others use WMS algorithm which is the improving Wu-Mamber for matching.Multi-pattern matching module for the large-scale URL keywords which has been optimized are implemented based on the PMUC algorithm and it is added to a scalable efficient intrusion monitoring system to test the performance of the algorithm. The off-line tests firstly validate the algorithm correctness. After verifying the correctness, performance comparison results between the performance optimization module and the original module are provided. Classification parameters including the classification length'm'and the automatic depth'D'are adjusted, the performance effect of the algorithm after adjusting parameters is tested, experienced parameter value based on 140,000 scale is provided. Real network dynamic data is used in the on-line testing to prove that the performance optimization for the large-scale URL keywords matching module has practical application value. The test results show that the consumed memory of the performance optimized matching module can be compressed to less than 5% of the original module, the initialization time of large-scale URL keywords matching module is significantly shortened than that of before.
Keywords/Search Tags:pattern matching, large-scale URL keywords matching, AC algorithm, Wu-Mamber algorithm, memory compression
PDF Full Text Request
Related items