Font Size: a A A

Research On Storage And Computing Optimization Technology In Deep Learning Accelerator

Posted on:2019-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z K NieFull Text:PDF
GTID:2428330611993518Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep Convolutional Neural Networks?DNNs?enable to make high-precision predictive decisions and are widely used in areas such as speech recognition,image recognition,and natural language processing.Convolutional neural networks are computation-intensive and storage-intensive.Although highly parallel devices effectively meet the computational needs,energy efficiency remains a problem that needs to be addressed.In the basic work phase,we first completed a simulation framework with three levels of storage hierarchy?network,on-chip cache,off-chip storage?,which can simulate multiple accelerator structures and evaluate the performance of various combinations between calculation sequences and data layouts.Later,we implement a convolutional neural network accelerator with a systolic array structure that balances I/O with computational speed and performs parallel computation of convolution operations.On this basis,we optimize the storage and computation of the accelerator.In terms of computational optimization,we design and optimize the PE structure to achieve weight repetition optimization,and design a two-stage flow to complete the steps of first accumulating and then multiplying.The multiplication operation is reduced during the process,reducing the computational power consumption on the chip.At the same time,the quantized weight index is used to flow,reducing bandwidth requirements.In terms of storage optimization,we propose two new convolution calculation models:NHWCfine and NHWCcoarse.Based on the fact that the weights can be cached on-chip,the reuse of the feature maps is fully utilized to reduce the off-chip memory accesses of the feature maps.In addition,the data of feature maps is rearranged in combination with the new computation sequence to highlight locality and maximize the use of access coalescence and bandwidth to provide a continuous memory access pattern.Experiments with various convolutional layers show that the proposed modes made up of computing sequences and data layouts are more energy efficient than the baseline mode on various networks.The reduction for total energy consumption is up to4.10×.The reduction for the off-chip memory access latency is up to 5.11×.Moreover,when the network is getting deeper and deeper,the effect of optimization will be more obvious.
Keywords/Search Tags:Deep Learning, Convolutional Neural Network, Acceleration, Data Layout, Weight Repetition
PDF Full Text Request
Related items