In recent years,intelligent algorithms represented by deep learning have been widely practiced and applied in many fields such as computer vision,image processing,and pattern recognition.Convolutional neural network is an important algorithm structure in deep learning.In practice,graphics processor(GPU),application specific integrated circuit(ASIC)and field programmable Gate Array(FPGA)are commonly used as hardware acceleration platforms to accelerate computational convolutional neural network algorithms.Among them,FPGA has a large number of logic resources,and the advantages of reconfigurability,low power consumption and high performance have attracted attention.In this context,this paper designs the convolutional neural network parallel accelerator based on FPGA platform.The main research contents are as follows:By theoretically analyzing the model of convolutional neural network,the performance and load of convolutional neural networks with different topologies are compared,and the appropriate convolutional neural network model SqueezeNet is selected and fine-tuned.Based on this,combined with the structural characteristics of convolutional neural networks,the parallelization strategy in the process of convolutional neural network inference operation is discussed.At the same time,the logical resources and storage requirements of convolutional neural network model operations and data interaction are analyzed.Aiming at the problem of convolutional neural network model in FPGA parallel parameter mapping,this paper proposes a system architecture that can accelerate the computational performance of convolutional neural network model,design feasible arithmetic unit and storage module,and calculate the computation load and data cache of the accelerator system through hardware optimization technology.Optimize the design.At the same time,in order to avoid the repeated data access affects the overall computing efficiency of the system,the producer-consumer model is proposed to integrate the network layer to maximize the on-chip data reuse and minimize the external memory read and write.Based on the Xilinx ZYNQ-XC7Z020 SOC,the design of the IP core for convolutional neural networks is completed based on advanced synthesis tools.According to the characteristics of each hierarchical structure of the convolutional neural network,each functional structure block is designed and optimized by instructions to improve the operation efficiency.The Vivado development tool is used to comprehensively implement the accelerator system,and the FPGA-based convolutional neural network accelerator system design is completed.Then verify the effectiveness of the accelerator system operation and analyze the overall resource usage.Through the comparison with the recognition rate of GPU and CPU,the performance and power consumption advantages of the accelerator design are demonstrated. |