Font Size: a A A

Research On Optimization Of The Storage Space In Deterministic Finite Automaton

Posted on:2016-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ShenFull Text:PDF
GTID:2298330467498860Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer network technology, Computer networks havebecome an essential part of people and the attendant problem of network security is alsobecoming seriously. Like network attacks, network fraud, even the lost of user’s account andpassword.Usually, all the malicious codes have been encapsulated into network packets.Howto find the malicious codes efficiently has become an important issue in the field of computersecurity. So regular expression is emerged, which using its own syntax rules to describe aparticular string of data sets and provide the final search results. As a very important part inthe pattern matching field.Among most text-based editors and searching tools, the regularexpression makes data processing more convenient and more efficient. Regular expressionsare usually implemented by Deterministic Finite Automaton (DFA) or non-DeterministicFinite Automaton (NFA). Classical Thompson automata and Glushkov automata have to takea lot of storage space to support the state transitions and the processing of the state transitionsis also very complex.As a result, The time complexity and space complexity of the automataare not satisfactory.Recently,the method which based on bit-parallel is making patternmatching completed by bit-vector.The bitwise operation accelerates the processing speed ofautomaton and the bit-vector reduces the storage space of automaton.Utilizing bit-parallelalgorithm,We can implemente automaton which processing speed can be further improvedand storage space can also be further reduced.In this paper,Considering the time ofconstructing Glushkov automata is faster. To use less storage space,Glushkov automata whichbased on bit-parallel is selected as the researching object. State transition table which is alsocalled T table is an important part of Glushkov automata.It takes up the most space of datatables in automaton.Based on the original bit-parallel algorithm, we propose two methods tocompression the space of T table: One is by constructing a shift function and an array tocompression all the entries in T table. This method is simple, but the same value of entrieswill result in data redundancy. Another is using MPHF (minimal perfect hash function)which does not have duplicate entries.In other words, all the entries are optimized.Minimalperfect hash function is based on the undirected graph.To make this method faster,we choosea faster perfect hash algorithm called RAM Algorithm.Through experiment we find out thestorage space of T table is less than the actual storage space about2-3times and using hash function occupy less storage space. Although perfect hash function provides the perfectset-one mapping method, but the implementation is relatively complicated. While trying tosolve the problem which is how to use string to generate the corresponding hash value, ourgoal is to make the algorithm as simple as possible, taking a shorter processing time, and toensure that the hash value is not repetitive. With the upgrading of computer graphicsprocessor, GPU-based parallel computing makes the high-speed processing of large-scale dataeasier. Considering that the NVDIA ’s CUDA parallel computing platform reduces thecomplexity of programming, this paper selects eight efficient hashing algorithms andimplements these eight algorithms on the CUDA platform. After analyzing the algorithmexecution time of CPU and GPU, we find that GPU-based multi-threaded programmingmodel can accelerate the process of implementation and reduce the processing time of theeight hashing algorithms.
Keywords/Search Tags:DFA, Perfect hash function, Storage space, Compression Algorithm
PDF Full Text Request
Related items