Parallel Design For Image Compression Algorithms On GPU

Posted on:2015-09-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Dai

Full Text:PDF

GTID:1228330467456549

Subject:Agricultural Electrification and Automation

Abstract/Summary:

PDF Full Text Request

Image compression coding can effectively reduce the redundant information of imagecontent, which saves storage space and transmission bandwidth. It not only requires a highercompression ratio but also has a faster processing speed when using image compressiontechnology for information processing. Therefore, a key issue in image compressiontechnology is how to improve the efficiency of compression. Today GPU parallel computingbecomes easier with the development of CUDA which defines an API and provides a runtimeenvironment and libraries to easily program these more recent GPUs. So we focus on threeimage compression algorithms with high complexity, and develop the parallel designstrategies for those algorithms on GPU. Meanwhile, some parallel skills of CUDA are alsoexploited to boost the performance of those algorithms further on GPU. The main jobs andinnovations of the dissertation are as follows:(1) The main drawback of the error-resilient entropy coding (EREC) algorithm is its highcomplexity. In order to overcome this disadvantage, a parallel EREC is implemented on aGPU using the NVIDIA CUDA technology. The original EREC is a finer-grained parallel ateach stage which brings additional communication overhead. To achieve high efficiency ofparallel EREC, we propose the partitioning EREC (P-EREC) algorithm, which splitsvariable-length blocks into groups and then every group is coded using the EREC separately.Each GPU thread processes one group so as to make the EREC coarse-grained parallel.(2) In addition, shared memory, texture memory, constant memory, thread allocation forhigh performance and other optimized techniques were fully used in order to obtain higherperformance of the P-EREC on GPU. In the case that the variable-length data blocks aredivided into128groups (256groups, resp.), experimental results show that the parallelP-EREC achieves to (to, resp.) speedup over the original C code ofthe EREC. Compared to the EREC, the P-EREC not only gets a good speedup performanceon the GPU but also slightly improves the resilience of the variable-length code (VLC)bit-stream against burst or random errors.(3) The major shortcoming of the two dimensional orthogonal matching pursuit(2D-OMP) algorithm resides in long computing time for signal recovery. In order toovercome this disadvantage, a novel parallel designing strategy of the2D-OMP algorithm isdeveloped on a GPU. We first analyze the complexity of the2D-OMP and point out that the bottlenecks lie in matrix inverse in the updating weights module and the projection module.Thus we adopt the strategy of matrix inverse update whose performance is superior totraditional methods to reduce the complexity of original matrix inverse, and then theprojection becomes the most time-consuming module. Hence a fast parallel matrix-matrixmultiplication is leveraged to accelerate the projection on GPU.(4) Moreover, a faster matrix-vector multiplication, a parallel reduction algorithm, andsome other parallel skills are also exploited to boost the performance of the2D-OMP on GPU.In the case of the sensing matrix with size of (, resp.) for ascale image, experimental results show that the parallel2D-OMP achieves to(to, resp.) speedup over the original C code. When the sensing matrix isfor a image, the parallel2D-OMP gains to speedup over the serialcode on CPU.(5) Joint-bitplane low density parity check (LDPC) decoding algorithm suffers from thedrawback of long computing time for decoding. Motivated by the evolution of the GPU andthe inherent parallel characteristic of the decoding algorithm, we propose a novel approach forthe computationally intensive processing of joint-bitplane LDPC decoding algorithm on GPU.Two different parallel models are utilized for belief passing between different nodes of thealgorithm. One is the parallel belief propagation (BP) between bit nodes and check nodes andthe other is the parallel BP between symbol nodes and bit nodes.(6) It is found that the bottlenecks of joint-bitplane LDPC decoding algorithm lie incomputing the overall probability mass functions (pmfs) of symbol nodes and the overallbeliefs of bit nodes. Thus a data partitioning method is leveraged in those two models in orderto split a large array into small pieces which can be loaded into L1cache instead of globalmemory. Moreover, coalesced memory access and shared memory are exploited to optimizethe performance of joint-bitplane LDPC decoding algorithm further. Overall, experimentalresults show that when the length-6336LDPC accumulate (LDPCA) code compress thesource, the joint-bitplane LDPC decoder can achieve about speedup on GPU comparedwith the original C code on CPU, while the length-50688LDPCA code with the joint-bitplaneLDPC decoding run about41times faster on GPU than CPU.

Keywords/Search Tags:

Image compression, parallel processing, GPU, CUDA, EREC, 2D-OMP, joint-bitplane LDPC decoding algorithm

PDF Full Text Request

Related items

1	Research And Application On Parallel And Iterative Decoding Methods Of Specific LDPC Codes
2	Parallel Decoding Of Ldpc Codes Based On The Cuda Platform
3	Research And GPU Implementation On LDPC Parallel Decoding Algorithms For DVB-S2 Standard
4	VLSI Design Of Bitplane Encoding In CCSDS Image Compression Algorithm
5	Research And Implementation Of Parallel Decoding Algorithm Of LDPC Codes
6	Research On Parallel Decoding Methods Of LDPC Codes On GPU Platform
7	Study On Parallel Decoders Of LDPC Codes And CUDA-based Implementation
8	Design And Implementation Of Parallel Algorithms For H.264Video Decoding Based On CUDA
9	Research On The Encoding/Decoding Algorithms Of LDPC Codes And Its Application For The Video Transmissions
10	Studies On Decoding Algorithms Of LDPC Codes Based On GPU