Font Size: a A A

Efficient And Reconfigurable Deep Convolutional Neural Network Acceleration System With 3D Stacked Memory

Posted on:2021-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ChengFull Text:PDF
GTID:2518306104987989Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Deep convolutional neural networks(DCNN)are often used to process the tasks of machine version,including target detection and scene labeling.DCNN has the characteristics of computationally and memory intensive,complex and diverse model structure.These characteristics bring challenges to heterogeneous acceleration.The current DCNN acceleration use rigid datdaflow to processe the computation of different DCNN models under limited on-board resources,resulting in performance and energy inefficiency.To solve the above problem,a flexible and reconfigurable DCNN acceleration system,FlexTetris,is proposed,which combines energy consumption and performance optimization.FlexTetris makes the computation closer to the storage location under the near data processing architecture based on 3D stacked memory.3D stacked memory supports large-capacity data storage,high-bandwidth and low-power DRAM access.At the same time,a large-scale processing element(PE)is integrated on the 3D stacked memory logic die,which provides high concurrent processing for DCNN processing.FlexTetris adopts a flexible data flow scheduling strategy,effectively uses the characteristics of 3D stacked memory and DCNN specific data reuse,and alleviates the energy consumption and performance bottleneck problems in the 3D stacked memory scenario.In the PE of FlexTetris,data flows into the multiply-accumulate unit in a specific sequence,and data reuse within the multiplyaccumulate unit can effectively relieves energy consumption bottlenecks.FlexTetris uses grouping map,unrolling multi-dimensional data,to distribute the computing tasks to different PEs,which improves the total PE utilization rate to alleviate performance bottlenecks.Meanwhile,loop blocking and rearrangement strategies optimizes data transmission between different levels in the multi-level storage structure of the FlexTetris,further alleviating the energy bottleneck problem.Finally,Flex Tteris implements an energy efficiency analysis tool on the host.The energy efficiency analysis tool is used to obtain the optimal energy efficiency scheduling scheme of different DCNN models,and the control unit of the FlexTetris system is reconfigured to support the optimal scheduling scheme.The experiment uses a variety of DCNN models for testing.Experimental results show that compared to Tetris,a DCNN acceleration system based on 3D stacked memory,the average energy of FlexTetris has decreased by 31.4%,and the average performance has increased by 12%.Compared with the DCNN acceleration scenario using low-power DRAM,the average energy of FlexTetris has decreased by 43.9%,and the average performance has been improved by 10%.Therefore,the flexibility of FlexTetris can benefit from the characteristics of 3D stacked memory.
Keywords/Search Tags:3D-stacked memory, Deep convolutional neural networks, Heterogeneous acceleration, Dataflow scheduling, Performance, Energy consumpution
PDF Full Text Request
Related items