| Fast Fourier Transform(FFT)algorithm,as one of the most time-consuming core algorithms in the field of digital signal processing,is widely used in various real-time signal processing systems such as radar and communication.At the same time FFT algorithm is the core algorithm to measure the performance of the processor.It is important to research the FFT algorithm to play the parallel processing of vector DSP(Digital Signal Processing).The characteristics of FT-M7002 architecture,it is of great theoretical significance and application value to design and implement an efficient FFT algorithm.In this thesis,we analyze the architecture characteristics of FT-M7002 and the characteristics of the mixed-radix FFT algorithm.For the parallelism of FT-M7002 architecture,a single-core assembly program is designed and implemented for small-medium scale complex FFT,real FFT and large-scale complex radix-2 and radix-4 mixed radix FFT algorithms.The main research work of this thesis is as follows.(1)Designed and implemented the program of small-scale and medium-scale complex FFT algorithm.The SIMD(Single Instruction Multiple Data)vectorization is achieved by using the calculation of rotation factor,storage of computational data,pointer switching butterfly group and configuration shuffle mode;the final integer transformation of FFT is achieved by combining group integer sequence and DMA index transfer;the double buffer DMA strategy is designed to make the time of group computation integer sequence coincide with the DMA index transfer time to improve the performance of the algorithm;for small-scale FFT,a separate implementation is used to reduce cycle consumption and for better parallel processing.The experimental results show that the average speedup ratio is 4.14 times for the small-scale and medium-scale complex FFT relative to the corresponding FFT algorithm of TI’s TMS320C6678 platform.(2)Designed and implemented the program of the small-scale and medium-scale real FFT algorithm programs and the large-scale complex FFT algorithm.The C2 R FFT and R2 C FFT are implemented by using Split operation and complex FFT algorithm.For the large-scale complex FFT algorithm,the algorithm flow of the large-scale FFT algorithm is analyzed in detail,and the processing is divided into three steps according to the number of operation steps of the FFT.A double-buffered DMA strategy is used in the processing,which makes the computation and data transmission time overlap.The experimental results show that the average speedup ratio is 2.40 times for small and medium scale C2 R FFT,2.34 times for R2 C FFT,and4.38 times for large-scale complex FFT relative to the corresponding FFT algorithm of TI’s TMS320C6678 platform. |