Font Size: a A A

The Convolutional Neural Network Accelerator Research Based On The Tiling Dataflow

Posted on:2019-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2428330611993315Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Convolutional neural networks have been recognized as the best algorithms for deep learning,and they are widely used in image recognition and object detection.Due to the size increasing of the neural network,it will have a large number of neurons and synapses.We can conclude that CNN is computationally intensive and its computational cost is unacceptable.In training process,people often use the GPU platform for calculations.But when we need to make a derivation,using a GPU is a costly and computationally inefficient approach.Especially for some embedded devices,using accelerators to exploit the parallelism of CNNs has become a popular choice.The classic Tiling dataflow has achieved a high performance.But the process elements utilization of the Tiling dataflow is very low.At present,many deep learning applications require high performance accelerators.This situation has led to increasingly stringent requirements for process elements utilization.In this paper,we focus on improving the performance of convolution process.And we only pay attention to the derivation of CNN.Based on the Tiling dataflow,two convolutional neural network accelerators under small-scale situation and large-scale situation are proposed.In order to achieve higher process elements utilization based on the Tiling dataflow,we propose the Single-Channel dataflow accelerator in the case of the small-scale process elements.In general,the convolution can be split into a six-layer loop.Including the input channel,the output channel,and the input feature maps,and finally the convolution maps.The Tiling dataflow expands the input channels and output channels.But since the input image typically has only three channels(RGB),parallel too many input channels can result in the waste of the process elements.Compared with the Tiling dataflow,the Single-Channel dataflow parallel the input feature maps and the output channels by swapping the parallel sequences.This accelerator improves the performance by increasing the utilization of the process elements.In this way,we can achieve 1.2x-7x performance improvement while the hardware area remains basically the same.However,as the requirements increasing for hardware performance,the scale of the process elements will inevitably become larger and larger.When the process elements size is increased to a certain extent,the utilization of the process elements will inevitably decline.This situation requires the hardware to expand more parallelism of the six-loops to suppress the process elements idling.On this basis,we propose a Configurable CNN accelerator.Combining the parallel strategies of the Single-Channel dataflow accelerator and the Tiling dataflow accelerator,Configurable CNN accelerator expand the four layers of the input channels,the output channels and the two-dimensional convolution maps.Unfortunately,expanding more loops requires a lot of hardware resources and greatly increases the difficulty of data scheduling.In order to reduce the hardware complexity,with guaranteeing the high-utilization of the process elements under the large-scale array,we only expands the input channels in several mode.Using the Configurable CNN accelerator to calculate the mainstream convolutional neural network,when the array size exceeds 512,the process elements utilization can be maintained at an average of 82%-90%.At the same time,the accelerator performance is substantially linearly proportional to the number of cell arrays.
Keywords/Search Tags:Convolutional Neural Network, Array Utilization, Tiling Dataflow, Accelerator, Hardware Complexity
PDF Full Text Request
Related items