Research On CNN-Oriented Load-Store Instruction Data Width And Its Transmission Method

Posted on:2022-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:F Z Zhang

Full Text:PDF

GTID:2518306563475174

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Convolutional Neural Network(CNN),as a typical network type in deep neural networks,is widely used in the field of artificial intelligence.Because it is a computationally intensive and memory-intensive task that represented by large-scale parameter training,which puts forward requirements for the design of a dedicated CNN training processor architecture.Nowadays,neural network training processors are usually on-chip multi-core processors with a large number of parallel computing resources.The serious imbalance between computing speed and memory access speed causes the "Memory Wall" problem.Due to the intensive memory access of CNN training,the memory needs to be read/written frequently.At the same time,the "Memory Wall" problem severely restricts memory access operations,resulting in increased memory access latency.Furthermore,the computationally intensive of CNN training requires efficient data transmission between multi-core on-chip,which consumes more time and energy,and "Transmission Wall" problem is highlighted.Therefore,the paper is based on the architecture of the neural network training processor of the network-on-chip,combines with the load-store instruction for data access and transmission in the neural network processor,and optimizes design for CNN training features.In order to alleviate the memory wall problem,from the perspective of memory access,the paper studies to reduce memory access latency and accelerate CNN training.According to the features of data types in CNN,the paper studies the memory-access width of data in the network,and mainly designs different load-store access data widths based on different data features of the convolutional layer,pooling layer,and fullconnection layer in CNN,that is,mixed memory access data width.Then on the simulation platform,the Le Net5 and Alex Net networks based on the MNIST data set are compared with the load-store instruction of the mixed memory access data width proposed in this paper and the general fixed memory access data width,training latency is used as the evaluation index and the conclusion is drawn that CNN training latency using mixed memory access width is lower than that of fixed memory access width,when the memory width setting range is between 16 bits and 512 bits.Training latency of Le Net5 is reduced by 12.87% and training latency of Alex Net is reduced by 8.90% on average.The most effective application scenario for the mixed memory access data width is the convolutional layer,and the performance of the mixed memory access width method in the full-connection layer is relatively poor.In order to alleviate the problem of the on-chip transmission wall,from the perspective of on-chip data transmission,the on-chip communication between the memory and the processing core of the CNN data is studied to achieve efficient transmission of CNN model data in the on-chip network.According to load-store access data width,CNN model access feature analysis,and access flow characteristics,three onchip transmission optimization strategies are proposed,which are the instruction access data width as priority,the instruction access type as priority,and the processing node PEmemory DDR distance as priority.Then Le Net5 and Alex Net are simulated for comparison experiments with optimized and non-optimized transmission.It is concluded that the optimization effect of the three optimized transmission strategies is that the PEDDR distance is the best,the instruction memory access data width is the second,and the instruction memory access type is the last.On-chip PE-DDR distance priority strategy is an important way to optimize the problem of on-chip transmission short boards when CNN training process.

Keywords/Search Tags:

CNN Training, Network-on-Chip, Memory Access Data Width, Training Latency, On-chip Transmission Optimization

PDF Full Text Request

Related items

1	DNN-Oriented Multi-memory Distributed Parameter Storage And Read-write Optimization Method
2	Research On Memory Access Latency Optimization And Buffer Allocation Management In DNN Training Processor
3	Research On Communication Application Training Project Based On Tri-networks Integration
4	The Topology Study Of Low-Latency And Unbuffered On-Chip Network With Separation Of Transmission And Control
5	Development Of Rotary Led Advertising Screen Based On Microcomputer Unit
6	The Research On Low-Energy And Low-Latency Of Network-on-Chip
7	Research On Memory Controllers' Placement Method For NoC Based On Multi-objective Optimization
8	Design And Implementation Of Assist Analysis Tool For Track Of Percolation Data On-chip
9	Dynamic Switching Of Memory Access Focus In Data Permeation And Migration On Processor Chip
10	The Design And Application Of The On-chip Memory For YHFT-DSPX