Font Size: a A A

Synthesis Of Floating-point Hardware DFT

Posted on:2018-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:G FengFull Text:PDF
GTID:2348330512986705Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Discrete Fourier Transform(DFT)is widely used in almost all fields of science and engineering.Meanwhile,modern applications processing big data,such as images and sound,require increasingly complex features,such as the long and non-power-of-two hardware DFT and floating-point operations with wide ranges and high effective resolutions.Also,Convolutional Neural Networks(CNN)is widely applied in modern machine learning and pattern recognition area.Not only performance,more and more attention is paid on energy efficienct and scalable devices like FPGA as a better solution than CPU and GPU.In this paper,we propose a method to extend the matrix-factorization-based DFT algorithm for performing non-power-of-two DFTs of length N equal to the product of coprime numbers as well.The algorithm is correctly proved,and also provides a new method to calculate input and output sequences different from the treditional DFT algorithm.We also present a new DFT architecture synthesizer with high portability,called AutoNFT,to generate hardware DFT in a fully parallel structure.It can auto-generate fully pipelined hardware structures for SRFFT-based algorithms,which consume fewer multipliers than the radix-2 and radix-4 based FFT algorithm,by introducing shift-register-based FIFOs to align the data paths,and,auto-cascade the non-power-of-two sub-length DFT hardware structures by sequencing the output signals.The architecture also contains a high-performance floating-point core to work at 1 GHz.The highly optimized FPGA-based CNN accelerator for LeNet-5 based on Zynq-7000 platform.The designed LeNet-5 accelerator on Zynq-7000 predict MNIST with a low error rate of 0.99%,3.32W power dissipation,37%higher throughput and 93.7%energy saving compared with Caffe.DFTs generated by AutoNFT can run at 500 Mhz using a 40 nm industry library.This technology can handle 115 billion fixed-point samples per second on 256-point DFT and 13.5 billion floating-point samples per second on 30-point DFT.
Keywords/Search Tags:DFT, floating-point, fixed-point, non-power-of-two, FPGA, ASIC, VHDL, autogeneration, synthesis, CNN
PDF Full Text Request
Related items