Font Size: a A A

Parallel Algorithm Design Of Communication Baseband Signal Processing Based On GPU

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChengFull Text:PDF
GTID:2308330485486059Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Software Defined Radio(SDR) is one of the important platforms for the research and implementation in communication areas. In the traditional SDR platforms, hybrid architecture with DSP and FPGA is adopted, which is hard to achieve rapid development and high system throughput simultaneously. Since Graphic Processing Unit(GPU) can provide a powerful parallel computing capabilities and easy programming platform, we analyze the SDR platform with GPU to implement communication baseband signal processing algorithms in this thesis.First, we describe the GPU hardware architecture and CUDA programming platform. After studying the GPU hardware architecture and its memory access scheme, we provide the optimal strategy to implement an algorithm on GPU.Secondly, we implement the MMSE detection algorithm for MIMO systems on GPU. We discuss the methods to realize the basic matrix operations on GPU. According to the GPU’s architecture, we propose a joint optimized MMSE detection algorithm implementation based on three cascaded kernel. In the MMSE detection algorithm, matrix inversion is the most complex operation. We efficiently realize a Gauss Jordan matrix inverse algorithm with the optimal shared memory access scheme for GPU. We implement 4?4 16 QAM MIMO system and 4?8 16 QAM MIMO system on GPU in this thesis.Finally, we proved three Turbo decoder architectures for 3GPP LTE with GPU, which are traditional Turbo decoder, subcode parallel Turbo decoder and fully parallel Turbo decoder. For the traditional Turbo decoder, we realize the parallel for 8 states and multiply codewords. In the forward and backward updatings, we use shared memory and registers to ensure just one global memory accessing. In the extrinsic information computation, we adopt 8 threads alternately parallel for 8 states and 8 times based on the shared memory. Thus, we can achieve full utilization of the computing resources for the traditional Turbo decoder architecture. For the subcode parallel Turbo decoder, we realize multiply subcode parallel in a single codeword, and use the PIVI compensation algorithm to ensure the BER performance. For the fully parallel Turbo decoder, we use a double level iteration, composition of outer iteration in CPU and inner iteration in GPU. In the inner iteration, we use two sets of shared memory to update ?,? in Ping-Pong mode. In order to guarantee the BER performance, we use an asynchronous mode to update the initial value of ? and the end value of ?. We implement these three Turbo decoder based on FULL-LOG-MAP and MAX-LOG-MAP algorithm on GPU.According to the simulation results, the GPU-based MMSE detection algorithm can achieve a throughput of 84 Mbps and the GPU-based Turbo decoder can achieve a throughput of 28 Mbps. Therefore, GPU can be used as an alternative scheme for DSP and FPGA in SDR platform designs.
Keywords/Search Tags:GPU, Parallel, MIMO, MMSE detection algorithm, Turbo code
PDF Full Text Request
Related items