| Due to the significant number of parameters and computations required for convolutional neural networks,it is challenging for conventional embedded devices to achieve real-time processing capabilities.However,FPGAs have emerged as a primary hardware platform for deploying convolutional neural networks in embedded systems,due to their abundant computing resources,flexible development and deployment options,and low power consumption.This paper focuses on the deployment of linear convolution calculations and nonlinear activation function calculations of convolutional neural networks on FPGAs,as follows:Firstly,the model was quantized within an acceptable accuracy range,and an efficient and flexible convolution computing engine was designed based on 8-bit quantized data to accelerate convolution kernel operations of various sizes.To enhance the computing power of the DSP,double 8-bit multiplication was implemented on a single DSP,and the cascaded operation was further expanded to 16 DSPs.Additionally,the clock frequency of the DSP was doubled the system clock frequency to increase the computing power of the DSP.To address the cross-clock domain issue,corresponding time-sequence constraint solutions and a data cache strategy are proposed.Secondly,To address the difficulties in deploying nonlinear activation functions on FPGAs and the associated high resource consumption,a nonlinear activation function Auto-LUT method based on a lookup table and piecewise linear approximation method is proposed.This method can significantly reduce the resource consumption of lookup tables and triggers on chip while maintaining accuracy.Compared to the NN-LUT method,Auto-LUT reduces the approximate error by 4.32%,decreases the resource usage of lookup tables by 56.32%,and lowers triggers by 32.31%.Finally,based on the above optimization method,a face recognition system based on FPGA is designed and tested by using the open data set.The experimental results show that the face recognition time of FPGA-based face recognition system is only 22 ms.Compared with CPUs and GPUs,FPGAs have certain advantages in speed and power consumption.The performance of 1130.49GOPS@INT8can be achieved by the FPGA-based convolutional neural network accelerator designed,with only7.832 W power consumption,featuring high performance and low power consumption. |