Font Size: a A A

Research On Memory Access Latency Optimization And Buffer Allocation Management In DNN Training Processor

Posted on:2022-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:B S WangFull Text:PDF
GTID:2518306563464094Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The DNN training processor is developing in the direction of integrating multi-core and multi-memory.The on-chip interconnection of multiple computing nodes and multiple memory nodes in the form of a Network on-Chip is a future research trend.The DNN training process is a computationally intensive task that requires multiple computing nodes to complete it in collaboration.The DNN training process has layer synchronization.When multiple on-chip computing nodes are concurrently executing the computing tasks of the convolutional layer or the fully connected layer,the training performance depends on the slowest node,that is,the short board effect.In order to prevent a single computing node from becoming a shortcoming for performing training tasks,it is necessary to solve the problem of fairness of memory access between computing nodes.However,the computing nodes are faster than the memory nodes.The training of DNN is a memory-intensive task.A large number of memory access requests will be stored in the buffer queue at the memory access interface of the DNN training processor,which serviced by the memory system.Therefore,by effectively managing the buffer queue at the memory access interface of the DNN training processor,on the one hand,the efficiency problem can be solved to reduce the memory access latency;On the other hand,it can also solve the fairness problem of buffer resource management and allocate as much as necessary to improve the training performance of the DNN training processor.In order to solve the efficiency problem,this paper analyzes the memory access characteristics of DNN training process.The memory access latency of DDR4 SDRAM in different scenarios is further analyzed.It is found that the main factors affecting memory access latency are reading after writing and row conflicts.Based on this,a novel memory access request sorting module is designed at the memory system access interface(hereinafter referred to as the memory access interface).This module sorts the memory access requests arriving at the memory access interface to reduce the number of reading after writing and row conflicts on the premise that the training process is carried out correctly,so as to reduce the memory access delay.In the process of training the MNIST data set by using the Le Net-5 network on the simulation platform,the memory access requests generated during the training process are sorted at the memory access interface.Compared with the memory access latency of unsorted memory access requests,the memory access latency after sorting is significantly reduced,and the memory access latency in different training scenarios is reduced by 19.93% to 32.87%.In order to solve the fairness problem,it is necessary to efficiently manage buffer resources.This paper solves the fairness problem from the level of allocation of the length of the memory access request queue at the memory access interface.In order to reduce the consumption of buffer resources,the method of dynamic buffer queue length allocation is adopted.According to the periodicity of memory access traffic and the coordination mechanism between sending rate and service rate during training,a multiperiod adaptive buffer queue length allocation algorithm is proposed.At the same time,the training performance is put forward as an evaluation index,which is defined as the total number of clock cycles consumed by training a sample.In the simulation scene of this paper,the general buffer allocation algorithm is reproduced and compared with the multi-period adaptive buffer queue length allocation algorithm.The relation of training performance under different buffer sizes is obtained.Experimental results show that the performance of the multi-period adaptive buffer queue length allocation algorithm proposed in this paper is better than that of the general buffer queue length allocation algorithm in the range of buffer size [200,260],and the training performance is improved by 5.10% to 6.27% compared with the general algorithm.As the buffer resource size is further reduced from 200 packets or 260 packets is further increased,the two algorithms have similar training performance.Therefore,the algorithm proposed in this paper uses less buffer resources under the condition of basically equal training performance.By sorting the memory access requests of the buffer queue at the memory access interface of the DNN training processor and dynamically allocating the length of the buffer queue,the problems of memory access efficiency and fairness are well solved.It provides theoretical analysis and experimental data support for the design of the DNN training processor,which has good academic significance and engineering application value.
Keywords/Search Tags:DNN training processor, memory access latency, sorting, buffer queue length allocation, training performance
PDF Full Text Request
Related items