Efficient And Reconfigurable Deep Convolutional Neural Network Acceleration System With 3D Stacked Memory

Posted on:2021-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Cheng

Full Text:PDF

GTID:2518306104987989

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Deep convolutional neural networks(DCNN)are often used to process the tasks of machine version,including target detection and scene labeling.DCNN has the characteristics of computationally and memory intensive,complex and diverse model structure.These characteristics bring challenges to heterogeneous acceleration.The current DCNN acceleration use rigid datdaflow to processe the computation of different DCNN models under limited on-board resources,resulting in performance and energy inefficiency.To solve the above problem,a flexible and reconfigurable DCNN acceleration system,FlexTetris,is proposed,which combines energy consumption and performance optimization.FlexTetris makes the computation closer to the storage location under the near data processing architecture based on 3D stacked memory.3D stacked memory supports large-capacity data storage,high-bandwidth and low-power DRAM access.At the same time,a large-scale processing element(PE)is integrated on the 3D stacked memory logic die,which provides high concurrent processing for DCNN processing.FlexTetris adopts a flexible data flow scheduling strategy,effectively uses the characteristics of 3D stacked memory and DCNN specific data reuse,and alleviates the energy consumption and performance bottleneck problems in the 3D stacked memory scenario.In the PE of FlexTetris,data flows into the multiply-accumulate unit in a specific sequence,and data reuse within the multiplyaccumulate unit can effectively relieves energy consumption bottlenecks.FlexTetris uses grouping map,unrolling multi-dimensional data,to distribute the computing tasks to different PEs,which improves the total PE utilization rate to alleviate performance bottlenecks.Meanwhile,loop blocking and rearrangement strategies optimizes data transmission between different levels in the multi-level storage structure of the FlexTetris,further alleviating the energy bottleneck problem.Finally,Flex Tteris implements an energy efficiency analysis tool on the host.The energy efficiency analysis tool is used to obtain the optimal energy efficiency scheduling scheme of different DCNN models,and the control unit of the FlexTetris system is reconfigured to support the optimal scheduling scheme.The experiment uses a variety of DCNN models for testing.Experimental results show that compared to Tetris,a DCNN acceleration system based on 3D stacked memory,the average energy of FlexTetris has decreased by 31.4%,and the average performance has increased by 12%.Compared with the DCNN acceleration scenario using low-power DRAM,the average energy of FlexTetris has decreased by 43.9%,and the average performance has been improved by 10%.Therefore,the flexibility of FlexTetris can benefit from the characteristics of 3D stacked memory.

Keywords/Search Tags:

3D-stacked memory, Deep convolutional neural networks, Heterogeneous acceleration, Dataflow scheduling, Performance, Energy consumpution

PDF Full Text Request

Related items

1	Design And Implementation Of Deep Convolutional Neural Networks Acceleration System Based On Heterogeneous Processor
2	Research On Heterogeneous Reconfigurable Dataflow Accelerator For Big Data Applications
3	Research On Deep Neural Networks Based Classification And Representation Learning Of Heterogeneous Networks
4	Dataflow Runtime System On Heterogeneous Convergence Platform
5	Study Of Heterogeneous Multi-core Acceleration Methods For Convolutional Neural Networks On Reconfigurable Platform
6	Research On Acceleration Method Of Deep Convolutional Neural Network Based On Heterogeneous Computing Platform
7	Research On Neural Network Compilation And Acceleration Technology Based On FPGA
8	Study On Acceleration Of Deep Convolutional Neural Network With Pruning
9	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
10	On Stacked And Deep Neural Netword With The Applaction Of Speech Separation