Font Size: a A A

Multi-pattern String Matching Optimization For The Ten Million Scale Network

Posted on:2018-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z X DongFull Text:PDF
GTID:2348330533969078Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,it is very important to discover the information fast and accurately from massive data,which requires multi-pattern string matching algorithms.Meanwhile,there are lots of users in the Internet,which will make the security issues increasingly important.Consi dering detecting network attacks as soon as possible,it is necessary to compare and analyze relevant behaviors in the Internet by multi-pattern matching algorithms.Multi-pattern string matching algorithm has been used in many fields,such as information security,intrusion detection system,medicine,data mining,information retrieval,text editor,biology,object recognition and so on.The paper analyzes several mainstream and common multi-pattern string matching algorithms.Aiming at the characteristic s of the data stream in the large scale network,if the pattern string set is small,the AC_QS algorithm will be chosen.However,when the pattern string set is large(up to ten thousands or more),the memory cost is not sufficient to run the AC_QS algorith m.AC automata algorithm is the first efficient algorithm for linear string matching with short matching time in such a large number of patterns.AC_QS algorithm is based on the AC algorithm to increase the bad character rules and further speed up the AC a lgorithm matching,but its space complexity is a little high.Based on the AC_QS algorithm,the paper mainly improves pattern matching from pattern preprocessing,dictionary Trie tree and matching process.The optimization of the pattern string preprocessi ng mainly shows the jump distance when the character matching fails,and enhances the algorithm efficiency.The optimization of the dictionary Trie tree is mainly performed via ordered tree to store the dictionary tree and reduce the number of characters t hat should compare.Optimization of the matching process in the matching process,if found some current characters may not match a pattern string,it will not compare the pattern string wtih the current characters,to reduce comparison times.In this paper,DKR algorithm calculate the string hash using the appropriate properties of the hash function.In this paper,we first design the hash function,and then classify these patterns.According to the designed hash function,we find the hash value of the half pattern string,and then divide the text to the target text segment according to the reference length of the mode string.Because of judging whether the target segment contains half segments of the pattern string each time,the distance of each jump is half of the reference length of the mode string,which greatly increases the matching efficiency of the algorithm.In summary,the design of the multi-pattern matching algorithm DAC_QS algorithm and DKR algorithm is suitable for the large scale network.The DAC_QS algorithm ensures efficient operation of the algorithm,but requires much memory.DKR algorithm is mainly used in the case that DAC_QS algorithm can not provide sufficient memory support.Theoretically the DKR algorithm can deal with that no matter how large the pattern string set,since the algorithm's space complexity is quite small.Through the experiments,the algorithms are verified and the analysis of the algorithm complexity in time and space related to performance matrics is presented.
Keywords/Search Tags:multiple patterns, pattern matching, AC algorithm, QS algorithm, KR algorithm, Hash function
PDF Full Text Request
Related items