| Deep neural network has made remarkable achievements in various domains.In order to deploy a model with a huge amount of parameters and computations to edge computing platforms such as FPGA,the model needs to be compressed.Quantization is one of the methods to compress the model.FPGA is a software-defined hardware chip that can be reprogrammed many times to facilitate the update of the deep neural network.In addition,FPGA also has the characteristics of high performance,low power,parallelism,etc.Based on the above problems,this thesis studies the quantization algorithm of deep neural network and its FPGA implementation,and designs and implements an FPGAbased image classification system.The main work of this thesis is as follows:(1)In order to reduce the hardware complexity of multiplication operation implemented in FPGA,this thesis proposed the Sum of Uniform and Power-of-Two(SUPT)quantization algorithm,and proposed a deployable quantization scheme based on this algorithm.On the FPGA platform,SUPT quantization algorithm can reduce the LUT resources occupied by multiplication,so that multiplication can be efficiently implemented using LUT resources.In addition,the experimental results show that the SUPT quantization algorithm has balanced performance on multiple models and is more universal than the uniform and Power-of-Two quantization algorithm.(2)In order to make full use of LUT and DSP resources in FPGA,this paper designed a multiplier based on LUT and a multiplier-adder based on DSP at first.Then this paper designed the convolution module,the max-pool module and the fully connected module by combining the optimization methods such as the adder-tree,the ping-pong cache and the pipeline.Finally,this paper combined the above modules according to the network structure of ResNet18 and MobileNetV2,and designed and implemented the two FPGAbased accelerators.The on-chip power consumption of ResNet18 accelerator is 6.551 W,throughput is 112.67 GOPS,the latency is about 33.36 ms,and the accuracy rate can reach 70.64%;The on-chip power consumption of the MobileNetV2 accelerator is 5.42 W,throughput is 108.08 GOPS,the latency is about 5.91 ms,and the accuracy rate can reach 67.16%.(3)In order to apply the deep neural network accelerators,this thesis designed and implemented an image classification system based on FPGA using software engineering.The system supports image classification of multiple images,and can display the classification results and the latency of the accelerator to users.In addition,the system also supports the switching of deep neural network accelerator,which provides users with a variety of options. |