Font Size: a A A

An Efficient Lookup Table Of FFT Parallelism Using CUDA On GPUs

Posted on:2018-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:S RenFull Text:PDF
GTID:2348330542460086Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Discrete Fourier transform(DFT)is one of the most influential mathematical formula in our time,and is widely used in many fields of science and engineering.Digital signal processing technology as a science and technology has strong vitality,and its core is DFT.In the practical application of DFT,the complexity and feasibility of the algorithm are all important factors that influence the efficiency of the algorithm.Fast Fourier transform(FFT)improved the efficiency by reducing the complexity of the algorithm.In recent years,GPU is developing at constantly high speed.Transiting from the o-riginal three-dimensional image processing,GPU is more commonly used for general com-puting.The increase of problem scale and complexity in scientific research give GPU ex-traordinary significance in scientific calculation.GPGPU technology try to make the GPU as CPU to do general computing.And CUDA which launched by NVIDIA provides a new solution for GPU parallel computing.With the CUDA architecture,GPU can do many data-intensive scientific computing.So using GPU to do FFT parallel implementation has extensive and profound significance.In order to realize the efficient FFT algorithm,this paper realizes the parallel algorithm of FFT on GPU,and proposes a method of building lookup table based on the texture mem-ory.By using the texture memory,two parallel algorithms are designed:the first utilizes a 2D lookup table,and the second is based on 1D lookup.The main work of this paper includes the following:First,for the operating e:fficiency of FFT algorithm on the CPU is not ideal,this paper design a parallel algorithm that can run on GPU.By analyzing hardware architecture and programming model of GPU and investigating deeply on the principle and peculiarity of decimation-in-time base-2 FFT,this paper proposes a feasible method for the parallel FFT design using CUDA.As we all know that the time complexity of fft is O(Nlog2N),while parallel computing reduces the nested loop from three layers to two layers,which reduces the computational complexity.Second,with reference to a variety of optimization methods,two kinds of texture mem-ory are used to accelerate the algorithm,and the parallel FFT is optimized from different angles.In the parallel programming environment that has been built and configured,com-pared with the CUFFT library function provided by CUDA,the better performance of two different fast Fourier transform parallel algorithms is successfully tested.By the exper-imental data,it is proved that the efficient GPU design of the parallel FFT algorithm is implemented.Last,in order to improve the efficiency of these algorithms,this paper transplant them to multi-GPUs.By using GPUDirect technology to achieve multi-GPU communication,and through the experimental results to prove the efficiency of multi-GPUs algorithm.
Keywords/Search Tags:Fast Fourier Transform(FFT), Graphics Processing Unit(GPU), parallel computation, Compute Unified Device Architecture(CUDA)
PDF Full Text Request
Related items