Font Size: a A A

Research On CNN-Oriented Load-Store Instruction Data Width And Its Transmission Method

Posted on:2022-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:F Z ZhangFull Text:PDF
GTID:2518306563475174Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Convolutional Neural Network(CNN),as a typical network type in deep neural networks,is widely used in the field of artificial intelligence.Because it is a computationally intensive and memory-intensive task that represented by large-scale parameter training,which puts forward requirements for the design of a dedicated CNN training processor architecture.Nowadays,neural network training processors are usually on-chip multi-core processors with a large number of parallel computing resources.The serious imbalance between computing speed and memory access speed causes the "Memory Wall" problem.Due to the intensive memory access of CNN training,the memory needs to be read/written frequently.At the same time,the "Memory Wall" problem severely restricts memory access operations,resulting in increased memory access latency.Furthermore,the computationally intensive of CNN training requires efficient data transmission between multi-core on-chip,which consumes more time and energy,and "Transmission Wall" problem is highlighted.Therefore,the paper is based on the architecture of the neural network training processor of the network-on-chip,combines with the load-store instruction for data access and transmission in the neural network processor,and optimizes design for CNN training features.In order to alleviate the memory wall problem,from the perspective of memory access,the paper studies to reduce memory access latency and accelerate CNN training.According to the features of data types in CNN,the paper studies the memory-access width of data in the network,and mainly designs different load-store access data widths based on different data features of the convolutional layer,pooling layer,and fullconnection layer in CNN,that is,mixed memory access data width.Then on the simulation platform,the Le Net5 and Alex Net networks based on the MNIST data set are compared with the load-store instruction of the mixed memory access data width proposed in this paper and the general fixed memory access data width,training latency is used as the evaluation index and the conclusion is drawn that CNN training latency using mixed memory access width is lower than that of fixed memory access width,when the memory width setting range is between 16 bits and 512 bits.Training latency of Le Net5 is reduced by 12.87% and training latency of Alex Net is reduced by 8.90% on average.The most effective application scenario for the mixed memory access data width is the convolutional layer,and the performance of the mixed memory access width method in the full-connection layer is relatively poor.In order to alleviate the problem of the on-chip transmission wall,from the perspective of on-chip data transmission,the on-chip communication between the memory and the processing core of the CNN data is studied to achieve efficient transmission of CNN model data in the on-chip network.According to load-store access data width,CNN model access feature analysis,and access flow characteristics,three onchip transmission optimization strategies are proposed,which are the instruction access data width as priority,the instruction access type as priority,and the processing node PEmemory DDR distance as priority.Then Le Net5 and Alex Net are simulated for comparison experiments with optimized and non-optimized transmission.It is concluded that the optimization effect of the three optimized transmission strategies is that the PEDDR distance is the best,the instruction memory access data width is the second,and the instruction memory access type is the last.On-chip PE-DDR distance priority strategy is an important way to optimize the problem of on-chip transmission short boards when CNN training process.
Keywords/Search Tags:CNN Training, Network-on-Chip, Memory Access Data Width, Training Latency, On-chip Transmission Optimization
PDF Full Text Request
Related items