Research On Storage And Computing Optimization Technology In Deep Learning Accelerator

Posted on:2019-10-16

Degree:Master

Type:Thesis

Country:China

Candidate:Z K Nie

Full Text:PDF

GTID:2428330611993518

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep Convolutional Neural Networks?DNNs?enable to make high-precision predictive decisions and are widely used in areas such as speech recognition,image recognition,and natural language processing.Convolutional neural networks are computation-intensive and storage-intensive.Although highly parallel devices effectively meet the computational needs,energy efficiency remains a problem that needs to be addressed.In the basic work phase,we first completed a simulation framework with three levels of storage hierarchy?network,on-chip cache,off-chip storage?,which can simulate multiple accelerator structures and evaluate the performance of various combinations between calculation sequences and data layouts.Later,we implement a convolutional neural network accelerator with a systolic array structure that balances I/O with computational speed and performs parallel computation of convolution operations.On this basis,we optimize the storage and computation of the accelerator.In terms of computational optimization,we design and optimize the PE structure to achieve weight repetition optimization,and design a two-stage flow to complete the steps of first accumulating and then multiplying.The multiplication operation is reduced during the process,reducing the computational power consumption on the chip.At the same time,the quantized weight index is used to flow,reducing bandwidth requirements.In terms of storage optimization,we propose two new convolution calculation models:NHWC_fine and NHWC_coarse.Based on the fact that the weights can be cached on-chip,the reuse of the feature maps is fully utilized to reduce the off-chip memory accesses of the feature maps.In addition,the data of feature maps is rearranged in combination with the new computation sequence to highlight locality and maximize the use of access coalescence and bandwidth to provide a continuous memory access pattern.Experiments with various convolutional layers show that the proposed modes made up of computing sequences and data layouts are more energy efficient than the baseline mode on various networks.The reduction for total energy consumption is up to4.10�.The reduction for the off-chip memory access latency is up to 5.11�.Moreover,when the network is getting deeper and deeper,the effect of optimization will be more obvious.

Keywords/Search Tags:

Deep Learning, Convolutional Neural Network, Acceleration, Data Layout, Weight Repetition

PDF Full Text Request

Related items

1	Research On Convolutional Neural Network Acceleration
2	Research On Acceleration Of Deep Convolutional Neural Network Based On FPGA
3	Research And Application Of The Pretraining Strategies Of Deep Convolutional Neural Netowrk
4	Research On Hardware Acceleration Method Of Deep Convolutional Neural Network Based On FPGA
5	Design And Implementation Of Lightweight Convolutional Neural Network Accelerator On SoPC
6	An Acceleration Structure Of Convolutional Neural Network
7	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
8	Research On Automatic Layout And Evaluation Of Presentations Based On Deep Learning
9	Vector Processing IP Core Design Based On Deep Learning Algorithm
10	Efficient And Reconfigurable Deep Convolutional Neural Network Acceleration System With 3D Stacked Memory