Font Size: a A A

High Speed Parallel Algorithms Research On GPU

Posted on:2015-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZuFull Text:PDF
GTID:1108330473461528Subject:Information security
Abstract/Summary:PDF Full Text Request
Graphics processing unit (GPU), which is initially designed for graphics render-ing, has gains its popularity in general purpose computing nowadays for its powerful floating-point computation and massive parallel processing, especially in the area of scientific computing. General-purpose computing on GPU has become the hot topic in industrial and academic area.Facing the dramatic increase of network traffic and complexity of packet proces-sion, network equipments now are required to possess more powerful computing power and consequently the GPU is introduced to the area of networking procession. The scientific computing mainly consists of computation-intensive problems with easy-to-use data parallelism. However, the network processing mainly consists of memory-intensive or I/O-intensive problems and the data parallelism is hard to be exploited and utilized. Therefore, in order to fully utilize the computation power of GPU in network processing, it needs to redesign and parallelize the existing network algorithms to make them fit for the architecture and characteristics of GPU.In this dissertation, two unsolved problems-regular expression matching and lossless data compression, are selected and studied to find their efficient solutions on GPU.There are two Genies-high speed matching and memory efficiency, in regu-lar expression matching and they are hard to remedy in practice, no mater on CPU or GPU. The DFA-based solutions suffer exponentially exploding state space and cannot be remedied without sacrificing matching speed. The NFA-based methods are memory efficient but suffer low and unstable matching speed. Based on in-depth understanding of NFA properties as well as GPU architecture, a new NFA-based implementation is proposed with features of high speed and memory efficiency.Lossless data compressions are always data-dependent both in the dictionary-based methods and in the statistical methods. Moreover, the single-instruction-multi-data ex-ecution model in GPU increases the difficulty of parallelizing data compression. In this thesis two typical compression algorithms (LZSS compression algorithm and Huff-man coding algorithm) are studied to fit the compression into GPU architecture, and the parallelized Deflate data compression algorithm are built upon the two.The main contributions and innovations are summarized as follows.1 To achieve high-speed and memory-efficient regular expression matching, an NFA-based method is introduced in this thesis to benefit from its low space com-plexity. The task distribution among GPU threads is optimized by introducing compatible group, compatible super group and virtual NFA. The memory access is also optimized with the usage of interwoven packets storage and coalesced global memory access. The evaluations show that the match speed achieves lOGbps while the state space still increases linearly.2 To parallelize LZSS, the dictionary-based lossless data compression algorithm on GPU, its dictionary is implemented with hash table.Moreover, through the inge-nious design of data structure and algorithm, the serialization of thread execution during parallelizing LZSS algorithm is eliminated and the GPU computing power is fully utilized. This design outperforms the best existing GPU-based LZSS im-plementation, both in compression speed and compression ratio.3 The Huffman coding algorithm is also parallelized in the scenario of Deflate algo-rithm. The three main stages of Huffman coding, computing histogram, construct-ing Huffman trees and variable-length encoding, are successfully parallelized through ingenious algorithm design and CUDA atomic operations. This is the first work in literature to parallelize Deflate algorithm on GPU. The experimental results exhibit that the compression ratio is close to that of Deflate algorithm and the compression speed outperforms the Deflate implementation on a quad-core CPU.The researches in this dissertation not only efficiently parallelize the regular ex-pression matching and lossless data compression on GPU, but also provide a method-ological guidance and technical reference for other algorithms to run efficiently on GPU.
Keywords/Search Tags:regular expression matching, NFA, lossless data compression, LZSS, Huff- man coding, Deflate, CUDA, GPU
PDF Full Text Request
Related items