Font Size: a A A

Real FFT Implementation And Performance Optimization On ARM V8 Processor

Posted on:2022-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J X GuoFull Text:PDF
GTID:2518306542983609Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Fast Fourier Transform(FFT)algorithm is a Fast algorithm of Discrete Fourier Transform(DFT)or its inverse Transform.It is an important part of the processor based software ecology,and has been widely used in engineering,science,physics and mathematics.Real FFT algorithm,as a discrete Fourier transform whose input or output is a sequence of real numbers,has a wide range of applications in intelligent computation,image processing,mathematics and other fields.With the increasing complexity of application scenarios,the performance of FFT algorithm in these application fields is also put forward higher and higher requirements.Therefore,it is of great significance and application value to study the high performance implementation and optimization method of FFT algorithm,especially the real number FFT algorithm,to meet the increasing performance demand in the application field.With the development of ARM architecture,especially the launch of ARM V8-A new generation of ARM architecture,the application field of ARM gradually expands from embedded end to server end.China's Tianhe E-class computing prototype and Japan's "Fuyue" supercomputer system all adopt ARM architecture.With the continuous expansion of the application field of ARM processor,the construction of basic software ecology based on AMR architecture has become the current research hotspot.FFT is an important part of the basic software ecology.It is of great practical significance to study the high-performance implementation of FFT algorithm on ARMV8 architecture.According to the architecture characteristics of ARMV8 computing platform,this paper from the butterfly network optimization,reducing network series,big butterfly calculation optimization,SIMD assembly optimization and register using strategy optimization of complex FFT algorithm is studied from the aspects such as high-performance realization and optimization method,especially for FFT calculation characteristics of large base,broke the due to lack of register resources performance bottlenecks and summarizes a set of high-performance implementation strategy of Cooley-Tukey FFT algorithm and optimization;On this basis,the paper studies the high performance implementation and optimization method of real FFT algorithm,defines the butterfly network construction and butterfly calculation method of real FFT of any scale,and finally completes the high performance implementation of R2 C and C2 R FFT algorithm on ARM.The experimental results show that the Performance of the large base complex FFT algorithm and the real number FFT algorithm implemented on the ARMv8 Huawei Kunpeng 920 processor is significantly improved compared with the ARM Performance Library and the open source FFT algorithm Library Fast Fourier Transform in the West.The Performance of the large complex base FFT is significantly improved compared with the small and medium base FFT algorithm.Compared with C2C?Split,the real FFT algorithm has better performance.The high-performance fast Fourier transform algorithm library studied in this paper has made great contributions to Kunpeng community and is of great significance to Chinese domestic processor and basic software ecosystem.The main contributions of this paper are as follows:(1)For complex FFT algorithm,summarize and reconstruct the butterfly network,and use the symmetry and periodicity of DFT matrix to greatly reduce the complexity of large base butterfly calculation;Especially for the calculation characteristics of R14,R20 and other large bases,it solves the performance bottleneck caused by the lack of registers.(2)For the real FFT algorithm,the real FFT calculation method was reconstructed,and the performance of the real FFT algorithm was greatly improved through butterfly network optimization,butterfly calculation optimization and underlying assembly parallelism.Finally,the real number FFT calculation of any scale is realized,which makes up for the defect that the existing C2C?Split algorithm cannot calculate the odd number sequence.It has improved the Open FFT algorithm library and contributed to Kunpeng community.(3)A set of implementation strategy and optimization scheme of FFT algorithm based on ARMv8 architecture is proposed,and a high-performance FFT algorithm library which can be transplanted across platforms is built.
Keywords/Search Tags:ARM V8, Fast Fourier Transform, real Fast Fourier Transform, SIMD optimization, performance optimization
PDF Full Text Request
Related items