| In recent years,Convolutional Neural Networks(CNNs)have been widely used in computer vision,natural language processing and other fields.The large scale of CNN data makes CNN inference tasks become memory intensive tasks,which puts forward new requirements for the architecture of CNN processors.The memory performance of the CNN processor which has a single memory is insufficient when executing CNN tasks,resulting in the problem of "memory wall".By designing a kind of No C-based CNN inference processor with multiple off-chip memories,the physical memory bandwidth of the processor can be effectively increased,so as to alleviate the problem of "memory wall".However,the memory access requests are unevenly distributed among different memories when multi-memory CNN inference processors execute CNN tasks,resulting in low utilization of memory bandwidth.Therefore,it is necessary to study the multimemory load-balance strategy of CNN inference processors to improve the memory bandwidth utilization and its overall efficiency.According to the difference in memory access characteristics,the memory access data in CNN inference tasks can be divided into read-only weight data and read-write input(output)neuron data.Multi memory CNN inference processors have the problem that the load of the memory access requests is not balanced between different memories for these two kinds of data.Since the read-only weight data and the read-write neuron data have different memory access characteristics,this paper proposes two different loadbalance strategies for these two kinds of data:(1)To improve the load-balance degree of memory access requests of read-write neuron data,we propose a "sensing and dynamic-memory-access strategy".The strategy can help the CNN inference processor perceive memory load information,and then send memory access requests of neuron data to the memory with less load.To verify the effectiveness of the proposed strategy,the simulation models of CNN inference processors with single memory and four memories are built respectively,and the simulation experiments are carried out using Le Net5 and Alex Net respectively.The experimental results show that the four-memory dynamic-memory-access processor can reduce the inference latency of Le Net5 and Alex Net by 63.13% and 35.64% respectively compared with the four-memory static-memory-access processor;it is 72.84% and 61.91%lower than that of single-memory processor respectively.It can be seen that the dynamicmemory-access strategy proposed can effectively improve the overall performance of multi-memory CNN inference processors by balancing the memory access requests of neuron data.(2)To improve the load-balance degree of memory access requests of read-only weight data,we propose a static-layout strategy of weight data that can evenly allocate the memory access streams.By optimizing the distributed storage mode of weight data in different memories,the strategy can realize the parallel memory access requests of weight data to be evenly distributed among different memories.No C simulation experiments show that the strategy can reduce the inference latency of Le Net5 and Alex Net inference tasks by 11.55% and 21.16% respectively,which proves that the static-layout strategy proposed in this paper can improve the overall performance of multi-memory CNN inference processors by balancing the memory access requests of weight data. |