This paper proposes a configurable convolutional neural network(CNN)accelerator based on ZYNQ,which can not only build various CNN models to perform edge inference tasks but also adjust the accelerator’s hardware resource usage by modifying configuration parameters to make better use of the current hardware platform.The main contributions of this work are as follows:(1)To address the issue of large data parameter volume in CNNs that cannot be fully loaded into on-chip cache space,this accelerator integrates multiple data partitioning methods and proposes an adaptive data reuse strategy that reduces the total parameter volume of data transmission by comparing and analyzing different reuse methods.(2)To meet the demand for fast construction of CNNs,this work encapsulates the required parameters of CNNs and defines a dedicated CNN instruction set.Users can quickly build multiple CNN models by calling this instruction set.(3)To facilitate the application of this accelerator on different hardware platforms,this paper proposes a software-hardware co-configuration scheme.Data bit width,MAC array parallelism,and intermediate cache space size are considered as configurable parameters,and different configuration methods can be selected to adjust the overall resource usage to adapt to the FPGA hardware platform.(4)To achieve lower power consumption for the same throughput,a clock domain partitioning solution is proposed.The core computing module works in a high-frequency clock domain,while the non-core module works in a low-frequency clock domain,which further optimizes circuit timing.Experimental validation is conducted on the Xilinx ZCU104 board.The experimental results show that when the MAC array parallelism is set to 1024,the data bit width is set to 8,and the core acceleration engine operates at 180 MHz,the peak throughput of the accelerator is 180 GOPS,the power consumption is 3.752 W,and the energy efficiency ratio is 47.97 GOPS/W.For the VGG16 network,the average MAC utilization rate of the convolutional layers reaches 84.37%.These results demonstrate that the proposed accelerator has good configurability and high efficiency performance and has a wide range of applications in edge computing and embedded devices. |