Font Size: a A A

Design And Verification Of 3D CNN Accelerator Using Reusability Of Data

Posted on:2021-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X X WuFull Text:PDF
GTID:2518306557987039Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Three-dimensional convolution(3D CNN)and three-dimensional deconvolution(3D DCNN)are widely used in algorithms such as motion recognition,video generation,and stereo matching.However,there is space to improve performance and reduce energy consumption for existing accelerator.Especially for performing 3D convolution workloads which contain both3 D convolution and 3D deconvolution,there is a large number of inconsequential multiplyadds on the zeros and redundancy memory access resulting in a significant overhead in terms of both performance and energy.Therefore,a 3D CNN accelerator using data reusability is proposed from the perspective of dataflow with the goal of achieving high performance and low energy consumption.In this work,a description scheme of accelerator dataflow and mapping is introduced.With this scheme,an accelerator dataflow is described by loop order and parallelization,and a mapping is described by parameters such as tile.On the basis of this description scheme,an dataflow and a accelerator Uarch1 are proposed for 3D CNNs.In Uarch1,data reusability is utilized to maximize data reuse in local memory,thereby minimizing accesses to the high-cost memory levels.In order to evaluate energy consumption and performance of the proposed accelerator,Uarch-compiler and Uarch-sim are proposed.The evaluation results of digital circuit EDA illustrate the rationality of these software tools.Finally,a accelerator Uarch2 is proposed for 3D convolution workloads that contain both 3D convolutions and 3D deconvolutions.In Uarch2,deconvolution transformation is utilized to reduce calculation amount and accesses of 3D deconvolution layers,and a fused-layer approach is utilized to reduce the expensive memory access when Uarch2 performs modules with branch structure.Evaluation with five classic 3D convolution workloads shows that Uarch2 achieves 11%and 28% performance speedup on average,60% and 43% reduction in energy consumption compared to the deconvolution accelerator GANAX and the 3D convolution accelerator Systolic cube.With SMIC 40 nm technology,the proposed accelerator achieves 459.4GOPS efficiency and 1.2TOPS/W energy efficiency.
Keywords/Search Tags:3D CNN, 3D DCNN, Accelerator, Dataflow
PDF Full Text Request
Related items