Font Size: a A A

Research On Turbo Decoding Algorithm Based On Parallel Computing Architecture

Posted on:2015-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:C J LiuFull Text:PDF
GTID:2298330467963514Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As a practical code that approaches channel capacity, Turbo codes has attracted interest from academia and industry and becoming focus of study since it was developed. Turbo codes are widely used in many3G and4G wireless standards such as CDMA2000, WCDMA/UMTS, IEEE802.16e WiMax, and3GPP LTE (long term evolution).Parallelism is a means to improve the performance of a program without increasing the clock frequency. Due to hardware resource constraints, the clock frequency and other restrictions, parallel computing has attracted more and more attention. Processor architectures expose parallel computation resources to programs in different ways at the different granularities. Bit-level parallelism improves parallel computing efficiency by increasing the processors’word length. Pipelined execution in an in-order scalar processor can be seen as a way to exploit instruction-level parallelism by overlapping different stages of the instruction execution from multiple instructions. Data-level parallelism refers to a plurality of different data is processed simultaneously by the same instruction, the instruction set, or algorithm. Task-level parallelism refers to a plurality of different data is processed simultaneously by different instructions, instruction sets, or algorithms.GPU (graphics processor) calculation is based on single instruction multiple threads (SIMT). As the brain of graphics card, GPU determines the quality of the graphics and most of the performance. At the beginning, GPU is only used for3D graphics, which is widely used in general parallel computing now. CUDA is a parallel computing frame that provides a convenient general-purpose GPU computing. The thesis briefly discusses the parallel computing and respectively expounds bit-level parallelism, instruction-level parallelism, data-level parallelism and task-level parallelism. Then the paper mainly discusses the general-purpose computing based on SIMT model. And then, the paper elaborates the two general-purpose computing framework:CUDA and OpenCL. And the thread hierarchy memory hierarchy and calculation mode of CUDA architecture is emphasized. The thesis discusses the CUDA architecture in thread hierarchy, memory hierarchies, and computing models. Then the theory of Turbo code, MAP algorithm and simplified MAP algorithms are discussed in the paper.This article focuses on the design of parallel Turbo decoder on GPU. According to the characteristics of the Turbo decoding algorithm and general-purpose hardware architecture, the thesis improves the throughput of decoder by increasing parallelism, optimization of share memory and optimization of instruction. Four levels of parallelism are implemented in this paper:multi-codes-level parallelism, sub-decoder-level parallelism, sub-frame-level parallelism and trellis-state-level parallelism. Paper designs a Grid-Block-Thread model which is suitable for four levels parallelism. A thread-block consists of64threads. The thesis designs a thread control algorithm which can resolve the bank conflict caused by eight threads calculation one extrinsic information at the same time. Paper’s innovation is adding sub-decoder parallelism which can improve the parallelism of the Turbo decoder. About shared memory, thread-control can solve bank conflict, which will reduce the memory access latency. Also, the optimization of control instructions aims to ensure the threads in one block have same execution path, which can improve the throughput of the decoder.This paper proposes a parallel Turbo decoder based on CUDA and implements the parallel decoder on GPU. Paper gives the result of the decoder on GeForce GTX550Ti. The parallel Turbo decoder’s throughput can reach70Mbps while keeping a reasonable BER. In the research of decoding algorithm, this paper studies decoding iterations required by sub-decoder parallelism, sub-frame parallelism and states-level parallelism.
Keywords/Search Tags:CUDA, GPGPU, Turbo, parallel decoder, SIMT
PDF Full Text Request
Related items