Font Size: a A A

FFT Implementation And Parallel Optimization Based On X86-64 Computing Platform

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:L N CaoFull Text:PDF
GTID:2428330575979904Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The X86-64 Haswell architecture builds on the success of the Sandy Bridge and Ivy Bridge architectures.It is the first processor to support the AVX2 instruction set,which can handle integer instructions of 256-bit vectors.It is also the first Intel processor to support Fusion Multiply and Add(FMA)instructions,and is well supported for high performance computing.DFT(Discrete Fourier Transform)algorithm is the most popular discrete transform algorithm in the field of digital signal processing,and FFT(Fast Fourier Transform)algorithm is a fast method for DFT algorithm.Algorithm.At present,there are relatively few mathematical libraries implemented on the X86-64 platform for the FFT algorithm,and there are very few commercial libraries actually put into use.However,with the popularity and rapid development of the X86-64 platform in the industry,the implementation of the FFT algorithm based on the X86-64 platform has great commercial prospects.This paper mainly describes the implementation of a high-performance FFT algorithm library based on the X86-64 Haswell architecture implemented during the project: OpenFFT.The library is optimized for performance and implementation of the Cooley-Tukey FFT algorithm on the X86-64 Haswell computing platform.The performance comparison object in this paper is MIT open source FFTW(Fast Fourier Transform in the West)algorithm library and Intel MKL FFTW(Intel? Math Kernel Library Fast Fourier Transform in the West)business algorithm library.In this paper,the implementation and optimization of OpenFFT mainly do the following work:1.The Cooley-Tuckey algorithm divides the DFT calculation sequence of length N into m sub-sequences and iterates on it,so an FFT butterfly network containing logmN level is generated,and each sub-sequence in the network of each level is butterfly.Shape calculation.This paper will do specific implementation and performance optimization for the FFT butterfly network.2.For each level of FFT butterfly network,N/m butterfly calculations will be included.Because the value of m is different,it is crucial for the realization and optimization of different butterfly shapes.In this paper,a detailed implementation and optimization scheme for butterfly calculation is proposed.3.According to the input and output data types calculated by FFT,the FFT calculation can be divided into three cases: complex to complex FFT calculation(C2C FFT),complex to real FFT calculation(C2R FFT)and real to complex FFT calculation(R2C FFT).This article will specifically explain the implementation of FFT calculations in these three different situations.4.This paper tests the experimental results from several different situations,that is,whether the input and output data are the same address,and the FFT is the inverse of the FFT.Compared with FFTW 3.3.8,the performance improvement is 5%~180%;compared with Intel MKL FFTW,the performance improvement is 5%~170%.
Keywords/Search Tags:X86-64 Haswell, SIMD, FFTW, Intel MKL FFTW, performance optimization
PDF Full Text Request
Related items