Font Size: a A A

Parallel Design Of JPEG-LS Encoder Based On CUDA

Posted on:2014-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:H DuanFull Text:PDF
GTID:2268330401973734Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The JPEG-LS is a standard established by ISO/ITU for lossless and near-losslesscompression of continuous tone images. The core of it is LOCO-I algorithm developed by HPLabs. With the rapid development of GPU technology, processing power and programmabilityof GPU are greatly improved. It makes it possible to use GPU as one kind of general purposecomputing platform. CUDA is a general purpose parallel computing architecture developedby NVIDIA Corporation. With the help of CUDA, the required parallel program can bedeveloped rapidly to resolve complex computational problems. Although the computationalcomplexity of JPEG-LS is low, for the image with large amount of data, the compression timeis still difficult to meet the real-time requirements of some special fields. Aiming at thisproblem, this thesis designs and implements a parallel JPEG-LS encoder based on CUDA.The main research contents and conclusions are as follows:(1) The parallelism of LOCO-I algorithm from the point of task parallelism and dataparallelism is analyzed. For task parallelism, as the data dependence of LOCO-I algorithmitself is high, and it is difficult to partition task effectively, the ideal degree of parallelism isunable to achieve. For data parallelism, through the method of data partition, the parallelismof the encoder execution can be greatly increased. Hence, in the thesis, the strategy of dataparallel is used to design the parallel JPEG-LS encoder.(2) The parallel JPEG-LS encoder based on CUDA programming model is designed andimplemented. The mapping from CUDA blocks and threads to sub-image blocks usingtwo-layer structure is achieved. Finally,6464pixel is selected as the basic unit of imagepartition, through testing different sizes of data block. In the compaction stage of JPEG-LScoded data blocks, parallel compaction function is designed and implemented and the speedof data compaction is improved.(3) A serial JPEG-LS decoder, which adoptes the same data partition type with theparallel JPEG-LS encoder, is developed. As the parallel JPEG-LS encoder is not totallycompatible with the standard JPEG-LS, the corresponding decoder is needed to be designed toconfirm the correctness of the encoder. Test results show that the images after compressed bythe parallel JPEG-LS encoder can be completely accurate recovery.(4) CUDA optimization strategy is used to optimize the parallel JPEG-LS encoder. Through adjusting the mapping relationship between CUDA threads and sub-image blocks,the memory load is balanced. Effective memory access bandwidth is improved by usingshared memory, texture memory, L1cache and coalesced memory access technology. Usingparallel prefix sum algorithm compute the starting postions of coded data in the final bitsteam. Using device overlap realizes the parallelization between GPU computing and datatransmission.(5) This thesis selects four AVIRIS images, which is frequently-used and available forpublic download, as the test data. The test results show that when the size of the sub-imageblock is6464pixels, the parallel JPEG-LS encoder obtains a speedup of26.3x comparedwith the traditional serial JPEG-LS encoder and can meet the conditions of rea-timecompression of the images.
Keywords/Search Tags:JPEG-LS, CUDA, GPU, Data parallel, AVIRIS images
PDF Full Text Request
Related items