Font Size: a A A

Convolutional Neural Network Model Compression And Inference Acceleration Based On Look Up Table

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:S Y XuFull Text:PDF
GTID:2428330611499322Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural networks(CNNs)have been widely applied for computer vision related tasks and have achieved dramatic accuracy improvements.However,the massive parameters and heavy computation requirements needed for CNNs limit the deployment of mobile terminals which are lack of computing power.Parameter quantization with lower bit-width is the common approach to reduce the computation loads in CNN inference.With the parameters being replaced by fixed-width binaries,multiplication operations can be replaced by the lookup table(LUT),where the multiplier-multiplicand operands serve as the table index,and the pre-calculated products serve as table elements.Because the histogram profiles of the parameters in different layers/channels differ significantly in CNN,previous LUT-based computation methods have to use different LUTs for each layer/channel,and consequently demand larger memory space along with extra access time and power consumption.In this work,we first normalize the parameters' Gaussian profiles of different layers/channels to have similar means and variances,and further quantize the normalized parameters into fixed-width through iteratively clustering.Because of the normalized parameters' profile,only single compact LUT(16 × 16 entries)is needed to replace all multiplications in the whole network.Experiments in image classification tasks demonstrate that with a compact 256-entry LUT,we can achieve the accuracy comparable to the results from 32-bit floating-point calculation;while significantly reduce the computation loads and memory spaces.Compared to previous work used LUT-based convolution,the size and quantity of LUTs used for CNN are significantly reduced in this work.To verify the effectiveness of the algorithm at the hardware level,this work implements a CNN inference system based on single lookup table,using FPGA as the target hardware platform.Based on the characteristics of lookup table multiplication computation,a synchronous data flow computational architecture for LUT based CNN is designed.A novel set of optimizations e.g.memory partition,stream rearrangement which enables efficient mapping of LUT based network to hardware,are proposed in this work.Basic CNN modules including LUT based convolution,pooling layer and fully connect layer are implemented with C++.Experiments show that LUT based CNN on PYNQ-Z2 FPGA platform superior to fixed-point implementation in resource usage,latency and throughput.Experiments show that LUT-based CNN implementation can save 56.1% of BRAM and 52.1% of DSP utilization and 21% of power consumption compared with fixed-point implementation,and achieve nearly 4.5GOPs/s computing throughput on PYNQ-z2,which is 59× faster than ARM cortex A9 processor.
Keywords/Search Tags:Convolution Neural Network, Network quantization, FPGA, Accelerator, Power Efficient Design
PDF Full Text Request
Related items