Font Size: a A A

Research Based On CUDA Parallel Computation Of FFT

Posted on:2013-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2248330395985251Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Discrete Fourier transformation is used in digital signal processing system ofimportant mathematics transform. The feasibility of the algorithm, complexity andefficiency are the important factors that affect the calculation results. In recent years,the rapid development of GPU is much higher than the speed of Moore’s Law. Themainstream GPU, single precision floating-point processing capabilities and externalmemory bandwidth compared to the same period has the obvious advantage of CPU,graphics hardware GPU-based on the general purpose computing is becoming aparallelhot research field. Especially, CUDA-Compute Unified Device architecture of-released by NVIDIA Corporation in2007, has significant improvements inprogramming, optimization, and greatly enhanced the general-purpose GPUcomputing power. CUDA doesn’t need graphic API, and uses the class C languagedevelopment, which makes it easier for programmers to shift from the CPUprogramming model to the GPU programming model. With the continuousimprovement and expansion of processing capacity and range of applications, theGPU has evolved into a highly parallel, multi-threaded, multi-core processor inparallel with the GPU programmability. Using GPU parallel processing capability, theheterogeneous parallel computing systems of characteristics with CPU+GPU hybridaccelerate will become the mainstream of future high-performance computing.This paper first analyzes the CUDA hardware architecture and programmingmodel. By analyzing the present situation of the GPU general calculation presented,the thesis puts forward the method of designing CUDA program. Then it discusses thebasic principle of fast Fourier transformation, elaborates the Decimation-In-Timefrom2-FFT algorithm realization and related properties, analyzes the fast Fourieralgorithm highly parallel the characteristics of partition, combined with CUDAprogramming model and realization mechanism, with CUDA class C language designthe fast Fourier transform parallel algorithm. Improved algorithm change the patternof the past that the program is run entirely on the CPU, with CPU+GPUheterogeneous model, GPU is introduced into the calculation, enable the GPU to takelarge-scale computing program-addition and multiplication of complex numbers, inorder to achieve fast Fourier transform acceleration and optimization. Traditionalserial algorithms transformation realizes the three-loop of FFT from N-point sequence, and the time complexity is O (Nlog2N). The improved algorithm uses thread-levelorganizational structure. It realizes the calculation and operation of the same levelindependent of N/2butterfly operations, which makes the original three-cycle becomea two-tier cycle, and the time complexity becomes O (N). Finally, the paper buildsthe CUDA experimental operating environment, realizes the operation of traditionalfast Fourier transformation algorithm run in the CPU, and the improved algorithm inthe GPU running. Apart from that it also called the code of the FFTW library andCUFFT library code, and compares these results. Analysis of experimental dataproves the superiority of the CUDA architecture to achieve fast Fourier, and verifiesthe dominance of the GPU in dealing with large amounts of data calculation.
Keywords/Search Tags:GPU, Discrete Fourier Transform, CUDA, Parallel-Computing, Heterogeneous model
PDF Full Text Request
Related items