Research Of FIR Filtering Parallel Algorithm Implemented In Frequency Domain Based On CUDA

Posted on:2013-06-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z Chen

Full Text:PDF

GTID:2298330467964844

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The rapid evolution of the Graphics Processor Unit (GPU) not only brings along the advance of the related applications such as virtual reality, computational simulation and image processing, but also extends its application to the outside of the general-purpose computing. Now there is a new trend to exploit the Computer Unified Device Architecture (CUDA) platform proposed by NVIDIA to implement high-performance parallel computing applications on GPU. More and more computation-intensive applications improve their performance dramatically by using efficient parallelized implementation on GPUThe fundamental building block, Finite Impulse Response (FIR) filter, has been widely used for the digital signal processing (DSP). In order to improve the performance of the FIR filter, the tap length for the FIR filter should be increased. This is a very typical computation-intensive application. Although the computational complex of the FIR filter implementation in frequency domain has been decreased significantly comparing to the implementation in time domain, it is still a big challenge for high order FIR filtering for streaming data of the high sample rate system.Based on the overlap-save method, a traditional FIR filtering in frequency domain, the thesis presents a high efficient parallelized overlap-save method which makes use of the new generation GPU architect. And this parallelized overlap-save method has been implemented on the NVIDIA GTX465by using CUDA technology. In order to maximizing the usage of the GPU global memory bandwidth, the parallelized overlap-save method adopts a new data partitioning method, which partitions the input data into the data chunks with the length equal to twice FFT size. This partitioning method can simplify the data arrangement for both input data and output results and eliminate the performance degradation of kernel function execution caused by the conditional divergence. Meanwhile, in order to make use of the memory coalescing for the GPU global memory, the parallelized overlap-save method optimizes the memory access for the threads in a warp by using the approach which the adjacent threads in a warp access to the adjacent data. Thus, this parallelized overlap-save method is very suitable for the new generation architecture by adopting these optimizations. In addition, this algorithm also utilizes the asynchronous data transferring method provided by the CUDA to overlap the data transferring time between host memory and GPU global memory with kernel execution time. Therefore, the kernel function computation and data transferring can be concurrent executed.The experimental results show the performance of the parallelized overlap-save method has been greatly improvement, comparing to the performance of the overlap-save method accelerated by using open source FFTW library on the Intel core i7. And the speedup ratio can achieve15.4.

Keywords/Search Tags:

frequency-domain FIR filtering, GPU, parallel algorithm, CUDA

PDF Full Text Request

Related items

1	Parallel Processing Of Remote Sensing Image Filtering Algorithm Based On CUDA
2	Research Of Image Registration And Filtering Method Using CUDA
3	CUDA-based Three-dimensional Non-local Mean Filtering Parallel Algorithm Design
4	Design And Implementation Of Broadband Multichannel Calibration Algorithm Based On CUDA
5	The Research Of Parallel FastSLAM Algorithm Based On CUDA
6	The Technical Research Of Frequency Domain Filter And Dual PRF Data Processing Of The New Generation’s Doppler Weather Radar
7	The Research And Application Of Parallel Particle Swarm Optimization Algorithm Based On CUDA
8	H.264Encoding Key Module Parallel Algorithm Design And Implementation On CUDA
9	Parallel Optimization Of Loop Filtering In HEVC/H.265on CUDA
10	Optimization Of CUDA-based Parallel SOM Algorithm And Its Application