Font Size: a A A

Bit-width And Sparsity Adaptive Accelerator Research And Design For Convolution Neural Network

Posted on:2019-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:J X GuoFull Text:PDF
GTID:2428330590951658Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural networks(CNNs)have achieved great success in many computer vision applications,but the high computational complexity that they present has hindered the performance improvement.Recently,various FPGA-based accelerators have been proposed to improve the performance of CNNs.A few previous researches have proved that per-layer of the CNNs requires different short bit-width and the sparse CNN technique has proved that eliminating weights results in a network with a substantial number of zero values,which can potentially reduce the computational requirements of accelerator.However,state-of-the-art most FPGA-based accelerators only use the same bit-width selection for all CNN layers and don't consider eliminating the zero values of sparse CNN.They are usually to implement an accelerator that we call as a convolutional processor(CP),which iteratively computes the CNN layers one at a time with the same selection of the bit-width.This leads to very low resource utilization and much difficulty in further performance improvement.In order to address the problem,in this paper,we propose a short bit-width and sparsity adaptive accelerator design for CNN which can simultaneously adapt to the short bit-width CNN and sparse CNN with various bit-width requirements in each layer.We optimize the DSP operations to get higher DSP utilization when the network has short bit-width requirements and construct multiple different bit-width CPs to process the CNN layers in parallel way.Then,we propose sparse CNN accelerator design approach to joint short bit-width weight taking kernel as unit based on the DSP optimization.Finally,we use short bit-width CNN and sparse CNN optimization to get optimal kernels combination and optimal hardware structure.As a result,our approach can get higher throughput over the state-of-the-art FPGA-based CNN accelerators from 5.68 x to 6.67 x and by 6.17 x on average.
Keywords/Search Tags:convolutional neural network, accelerator, FPGA, short bit-width CNN, sparsity CNN
PDF Full Text Request
Related items