Font Size: a A A

Research On High-performance Online Pattern Matching Algorithm

Posted on:2016-04-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:D L XuFull Text:PDF
GTID:1108330479978724Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid growth of Internet Traffic has emerged as a major issue due to the rapid development of various network applications and network communication technology. In China international bandwidth has been rapidly growing by 80% per year and has reached 3688 Gbps so far. In addition, network attacks have become more and more diverse. Network and information security technology challenges exist. High-performance low-memory pattern-matching technology has become one of the key technologies used to monitor the large flow of network information in real time.First, In order to improve the performance of serial pattern matching algorithms, a parallel algorithm called Parallel Extended Bloom Filter(PEBF) for exact multi-pattern matching based on bloom filter is proposed. We divided the pattern set into N subsets and constructed an extended bloom filter for each subsets which are processed by N threads simultaneously. We implemented our solution on the graphics processing unit(GPU) called G-PEBF. Experimental results on GPU show that the performace of G-PEBF is 10 times more effective than that of PEBF in the worst caseSecond, In order to realize the parallelization of serial pattern matching algorithm, vector-based and matrix models are proposed in this paper. Based on these two models, we further proposed the vector-based single-pattern matching and the matrix-based multi-pattern matching(MBMP) algorithms, as well as the matrix-based multi-pattern approximate(MBMPA) and the matrix-based multi-pattern exact(MBMPE) algorithms. The G-MBMP algorithm refers to the implementation of the MBMP algorithm on a GPU. Experimental results show that the efficiency of the G-MBMPA and G-MBMPE algorithm is 1.5 times of the comparison algorithms, respectively. It shows that the matrix model proposed can address the processing parallel pattern-matching problem.Third, the memory usage and conflict rate of the pattern-matching algorithm for millions of patterns is higher. To address the issue, we propose a randomizing fingerprint-based Wu-Manber(RFP-WM) algorithm that can effectively reduce false positive rate by calculating a unique fingerprint for each pattern. Compared with the WM algorithm, the RFP-WM algorithm greatly reduces the hash collision rate and increases the hit rate. The efficiency of RFP-WM algorithm is improved by 48.1%-65.3% on 5 test data sets.Fourth, to address the issue of pattern matching with URL as the pattern, we fully utilise the URL characteristic to implement the longest prefix matching efficiently. A scalable multi-hash(SMH) name lookup method is proposed. SMH is based on hierarchical name decomposition to aggregate names that share common prefixes and multiple scalable hash tables to minimise collisions among prefixes. The component is taken instead of the entire name as a key in the hash functions. First, URL are decomposed into components according to the hierarchical structure because each component in an URL is known. Second, a hash table is created for each layer of components. A hash is computed over the first component, and the result is a pointer to the next hash table, which is keyed with the hash of the second component, and so on. To eliminate false positives and improve the search efficiency, each component and name ID to which the component belongs is stored in the corresponding hash buckets for the probe. The experimental results show that the performance of the SMH algorithm is better than that of the NCE, CT and BH algorithms, which are the classic longest prefix matching algorithms. And the SMH algorithm is good at reducing memory consumption, has good scalability, which is suigtable for handling large-scale URL pattern set.
Keywords/Search Tags:pattern matching, online matching, matrix, GPU, fingerprint model, URL longest prefix matching
PDF Full Text Request
Related items