Font Size: a A A

Design And Implementation Of Vectorized FFTs On The YHFT-Matrix

Posted on:2013-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:J H HuangFull Text:PDF
GTID:2268330392473804Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the emergence of high performance computing applications such as LTE,4G wirelesscommunication, video coding, image matching, radar signal processing and so on, it is difficultfor single-core scalar processors to meet the demand of high-density real-time computing, andmulti-core vector processors have become the mainstream of the current processor design.However, algorithms in multi-core vector processors now facing great challenges in the area ofparallel programming and storage management. Therefore, how to efficiently develop multistageparallelism of multi-core vector processors becomes the hot research spot at present.YHFT-Matrix is a SDR oriented high performance multi-core vector processor designed bythe National University of Defense Technology (NUDT). FFT/IFFT is the core algorithm ofwireless communication technologies such as OFDM modulation and demodulation, MIMOchannel estimation. Hence, studies on vectorized design and implementation method of efficientFFT/IFFT algorithm have great significant value in both theory and application.The main work of this thesis is as follows:(1) According to the architecuture feature of vector data access, processing and shufflenetwork in the single-core YHFT-Matrix, the high-efficient vectorization methods of radix-2,radix-3, radix-4and radix-5FFTs are proposed. The methods utilize the inherent parallelism ofFFT algorithms to fully exploit instruction-level, data-level, and multi-core parallisms of theYHFT-Matrix. Experiemental results show that these algorithms achieve high computingperformance and speedups. For instance, the execution time of the2K-point radix-2FFT is2985cycles, obtaining the speedup of15.3in comparsion to TIC62xx under the same clock frequency;the execution time of the64K-point radix-4FFT is91643cycles, obtaining the speedup of14.48in comparsion to TIC62xx under the same clock frequency.(2) Based on the aforementioned methods, a vectorization method of implementing hybridradix FFTs is further proposed. Experimental results exhibit that1200-point hybrid radix FFTconsumes1982-cycle execution time, achieving better performance.(3) Using the two multi-core data communication mechanisms in quard-core Matrix (theSDP hardware synchronization and the Qlink data transfer), the thesis proposes a verctorizationmethod of radix-2FFTs that supports quard-core parallelism. Experimental results show that the64K-point radix-2FFT using quard-core parallelism takes46953cycles and obtains the speedupof2.58in comparison to the single-core Matrix, achieving higher performance.(4) Finally, a synthetic OFDM receiver system is implemented, containing FFT, bit reverseorder, channel estimation, MIMO equilibrium and IFFT. Experimental results show that,thesystem achieves high computing performance. The computation time of every sub-frame is234us, completely meeting the1ms requirements of LTE.
Keywords/Search Tags:FFT, Vectorization, Parallel, Multi-core processor, YHFT-Matrix
PDF Full Text Request
Related items