| Convolutional Neural Network(CNN),as a representative algorithm of deep learning,is widely used in computer vision,speech recognition,natural language processing and other fields because of its excellent recognition ability.However,in order to improve the recognition rate of the CNN model,its model scale continues to expand,and the number of model parameters and calculations increase sharply.Traditional processors and systems cannot meet the growing tensor calculation and data storage requirements of CNN models,so efficient dedicated accelerators become the basis for CNN model applications.Based on the characteristics of CNN model calculation and data transmission,this paper designs and implements a highly parallel accelerator with efficient data multiplexing to accelerate the CNN model.The main work of this paper is as follows:(1)A data stream for efficient data multiplexing is designed to reduce the cost of CNN data transmission.Activations and weights can be flexibly moved within the PE(Process Element)array to realize multiplexing of weights and activations in time and space.This architecture maximizes the number of operands retained within the PE array for multiplexing,reducing repeated data transmission.At the same time,a "zero-fill method" pooling and data rearrangement strategy are designed for this data flow to reduce the complexity of accelerator control.(2)A computing architecture with multiple parallel strategies is designed to improve the computing efficiency of CNN.In this architecture,convolution window-level parallelism is performed between PEs,and convolution kernel-level parallelism is performed between PE arrays.According to the computing task requirements of each layer of the CNN model,the accelerator automatically generates computing control instructions for each layer to improve control efficiency.(3)Implement hardware accelerators and carry out CNN model accelerated computing experiments.This design implements the accelerator based on the RTL language Verilog,and writes software programs such as data preprocessing and postprocessing.The accelerator is simulated by Model Sim to verify the acceleration effect of CNN,and comparative experiments are carried out on different sizes of PE arrays and different transmission methods of edge data of set of windows to verify the effect of the design.Through simulation experiments,the proposed CNN accelerator has an actual computing performance of 260.812 GOPS at a computing frequency of 200 MHz.Compared with traditional processors,the performance has been improved by about 7 times,realizing the expected acceleration effect of the design. |