| In recent years,with the development of digital technology and the continuous increase of effective data,a new research method of deep learning has appeared in artificial intelligence.This method fully reflects its ability and effectiveness in solving complex learning problems.In particular,the application of Convolutional Neural Networks(CNN)makes deep learning shine in image detection and target recognition.However,the increasing depth of neural networks requires a large amount of CPU computing and memory bandwidth,which makes conventional CPUs unable to reach the required performance level.Therefore,some manufacturers have used hardware accelerators such as application-specific integrated circuits(ASICs),field-programmable gate arrays(FPGAs),and graphics processing units(GPUs)to improve the throughput of CNNs.Of the three,because FPGAs have the characteristics of maximizing parallelism and energy efficiency,they have broad application prospects in the low-power embedded field.Based on the development board Zedboard of the Zynq platform,this paper implements a hardware acceleration model that can completely accelerate the forward inference process.The paper first analyzes each layer structure of the convolutional neural network,focusing on the four-level parallelism in the convolutional layer,and secondly analyzes the classic network structure Le Net-5 and uses a channel-based at the software level.The pruning optimization method compresses the Le Net-5 network for efficient operation in the development board.Then considering the hardware characteristics of FPGA,16-bit fixed-point storage and calculation of parameters are used at the hardware level,and the hardware framework of each layer is optimized according to the network design.Pipelining is used in the calculation method,ping-pong buffer is used in data transmission,and array partition is used in data storage.The use of these three optimization techniques allows each layer to have different degrees of parallelism and improve the overall operation efficiency of the accelerator.Finally,this paper uses 10,000 MNIST and Fashion-MNIST samples to test the accelerator in three aspects: accuracy,speed,and power consumption.The results show that the FPGA is1.79 times faster than Intel's i5-1035G1 and 9.96 times faster than the ARM CPU when the accuracy is almost the same.In terms of power consumption,the energy efficiency ratio(GOPs/W)is selected as a reference Indicators,FPGA is 6.53 times that of ARM CPU,42.5times that of i5-CPU,and 2.46 times that of GTX1080 GPU.The experiment proves that the FPGA accelerator designed in this paper has excellent energy efficiency ratio and short development time,and it has certain development prospects in the field of low power consumption. |