Font Size: a A A

Parallel Implementation And Optimization Of WNC Arithmetic Coding Based On GPU

Posted on:2016-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:J K TanFull Text:PDF
GTID:2308330461966960Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, more processing cores and higher memory bandwidth have been brought by the rapid development of graphics processing unit technology, which improved the GPU’s processing power and programmability, and made the GPU-based massively parallel computing possible. Arithmetic coding is an effective method for lossless data compression in entropy coding. When the symbols set and symbol probabilities are known, the result of arithmetic coding is close to the optimal code. Arithmetic coding can approximate to the information source entropy in accordance with fractional bit, so that arithmetic coding has better coding efficiency than Huffman coding. The implementation of arithmetic coding is in strict order. However, the applications of arithmetic coding are limited by its much higher complexity computation in frequent multiplication and branching operations. By studying and analyzing the classical WNC arithmetic coding scheme in this paper, the designning and implementation of block parallel arithmetic coding are achieved by using parallel computing power of GPU to improve the performance of arithmetic coding. The main research contents and conclusions of this study are as follows:(1) The parallel 4-table update arithmetic coding model based on GPU is achieved. In view of the 4-table update model of classic WNC arithmetic coding, the CUDA programming model is adopted and the source data is divided into many small blocks so that each CUDA thread complete a small data block coding task, and then connect the CUDA thread output bit streams as the output coding result to achieve a block parallel arithmetic coding scheme based on GPU.(2) The 4-table update model is simplified. Too many resources are occupied by 4-table update model arithmetic coding scheme in encoding and cannot be used, so this study will reduce the number of tables in 4-table update model to simplify it as 1-table model, and use a two-layer prefix sum algorithm to acquire the starting position of each data block to connect the output of data blocks as the output result, so that the simplified 1-table model is more suitable for parallel computing and storage on GPU.(3) The parallel computing arithmetic coding based on GPU is optimized. The optimization measures for the parallel computing arithmetic coding include asynchronous data transfer, shared memory access and coalesced global memory access. Asynchronous data transfer is used to decrease data transmission time, coalesced global memory access is to reduce reading data sequence time, and shared memory access is to improve access speed to obtain a higher memory bandwidth. The results show that using asynchronous data transfer and coalesced global memory access can also get better optimization effects for the parallel arithmetic coding scheme, and the optimized effect of coalesced global memory access can reach 80.2% compared to uncoalesced global memory access when the data size is 64 MB.Four remote sensing images are selected to test the performance of parallel computing arithmetic coding. The experimental results show that the maximum speedups of parallel arithmetic coding schemes based on 4-table update model and 1-table model are 41.7 times and 49.1 times comparaed with their serial programs, confirming the 1-table model has a better performance.
Keywords/Search Tags:graphics process unit, arithmetic coding, compute unified device architecture, parallel computing, speedup
PDF Full Text Request
Related items