FFT Implementation And Parallel Optimization Based On X86-64 Computing Platform

Posted on:2020-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:L N Cao

Full Text:PDF

GTID:2428330575979904

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The X86-64 Haswell architecture builds on the success of the Sandy Bridge and Ivy Bridge architectures.It is the first processor to support the AVX2 instruction set,which can handle integer instructions of 256-bit vectors.It is also the first Intel processor to support Fusion Multiply and Add(FMA)instructions,and is well supported for high performance computing.DFT(Discrete Fourier Transform)algorithm is the most popular discrete transform algorithm in the field of digital signal processing,and FFT(Fast Fourier Transform)algorithm is a fast method for DFT algorithm.Algorithm.At present,there are relatively few mathematical libraries implemented on the X86-64 platform for the FFT algorithm,and there are very few commercial libraries actually put into use.However,with the popularity and rapid development of the X86-64 platform in the industry,the implementation of the FFT algorithm based on the X86-64 platform has great commercial prospects.This paper mainly describes the implementation of a high-performance FFT algorithm library based on the X86-64 Haswell architecture implemented during the project: OpenFFT.The library is optimized for performance and implementation of the Cooley-Tukey FFT algorithm on the X86-64 Haswell computing platform.The performance comparison object in this paper is MIT open source FFTW(Fast Fourier Transform in the West)algorithm library and Intel MKL FFTW(Intel? Math Kernel Library Fast Fourier Transform in the West)business algorithm library.In this paper,the implementation and optimization of OpenFFT mainly do the following work:1.The Cooley-Tuckey algorithm divides the DFT calculation sequence of length N into m sub-sequences and iterates on it,so an FFT butterfly network containing logmN level is generated,and each sub-sequence in the network of each level is butterfly.Shape calculation.This paper will do specific implementation and performance optimization for the FFT butterfly network.2.For each level of FFT butterfly network,N/m butterfly calculations will be included.Because the value of m is different,it is crucial for the realization and optimization of different butterfly shapes.In this paper,a detailed implementation and optimization scheme for butterfly calculation is proposed.3.According to the input and output data types calculated by FFT,the FFT calculation can be divided into three cases: complex to complex FFT calculation(C2C FFT),complex to real FFT calculation(C2R FFT)and real to complex FFT calculation(R2C FFT).This article will specifically explain the implementation of FFT calculations in these three different situations.4.This paper tests the experimental results from several different situations,that is,whether the input and output data are the same address,and the FFT is the inverse of the FFT.Compared with FFTW 3.3.8,the performance improvement is 5%~180%;compared with Intel MKL FFTW,the performance improvement is 5%~170%.

Keywords/Search Tags:

X86-64 Haswell, SIMD, FFTW, Intel MKL FFTW, performance optimization

PDF Full Text Request

Related items

1	FFT Implementation And Optimization On ARM V8 Platform
2	Research On Key Issues Of Performance Optimization In High Performance Computing Based On The Godson
3	Research Of3PCF Parallel Algorithm Based On MPI
4	Remote Sensing Image Fast Fourier Transform To The Research And Implementation Of The New Algorithm
5	Research On The Key Technology Of FFT Algorithm Implementation And Optimization For SIMD Computing Platform
6	Research Of Location Algorithm And Pattern Recognition Technology Based On Dual M-Z Perimeter Protection System
7	Research Of Programming And Performance Optimization Under Intel Many-core Architecture
8	The Implementation And Optimization For Lte Pusch Receiver Based On The General Purpose Processor
9	Optimization And Implementation Of The DSP’s Benchmark Base On SIMD Architecture
10	Research On Transmission Characteristics Of Planar Optical Waveguide With Arbitrary Refractive Index Distribution