Font Size: a A A

Deep Convolution Algorithm Optimization And Hardware Acceleration

Posted on:2020-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:S H FuFull Text:PDF
GTID:2428330575998396Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Deep Convolutional Neural Network(DCNN)has been fully developed in various fields,such as speech recognition and image detection.Because CNN is very computationally intensive,it is difficult to apply it to embedded platforms and IoT devices with energy limitations.However,in recent years,the development of Programmable Logic Gate Array(FPGA),due to its large amount of computing resources,and excellent energy efficiency and programmability,can design a unique parallel computing architecture,making Convolutional Neural Networks can be used on low-power embedded devices.Today's hardware accelerator designs tend to use an underlying architecture similar to a Multiply-Accumulator(MAC),the disadvantage of this approach is that it is possible to limit the performance of the accelerator on the number of DSPs on the FPGA,but other on-chip resources are not fully utilized.In order to solve this problem,this paper considers the method of transforming the convolution calculation,transforms the design space of the accelerator,and releases the pressure on the required DSP resources.Using the methods in this paper to balance the use of on-chip memory,logic resources,and DSP resources,the accelerators in this article are significantly better than the prior art.In this paper,we use the sparsity of deep convolutional neural network convolution calculation to pruning and quantifying the model,and propose a new sparse convolution method ABM-SpConv(Accumulate-Before-Multiply Sparse Convolution),which first performs feature mapping.Accumulate and then multiply partial results by non-zero weights.In this way,more accumulation operations than multiplication operations are implemented during the convolution operation.Therefore,when implemented in hardware,the performance is accumulator constrained,thus relaxing the DSP unit when implemented on the FPGA.The demand has increased the utilization of resources.An FPGA-based heterogeneous parallel computing framework with low power consumption and high parallelism is designed.It is written in Open Computing Language(OpenCL),including task scheduling unit,extraction/storage unit,multiple convolution units and other functional layer units.The convolution unit is synchronized by using the task scheduler to resolve the computational load imbalance between the sparse convolution kernels.Encode the sparse network model to solve the problem of bandwidth inefficiency caused by the irregularity of sparse weight storage.The convolution unit consists of a heterogeneous array of accumulators and multipliers to match the different computational flows of ABM-SpConv.In this paper,the proposed sparse deep convolutional network accelerator architecture is implemented on the DE5-Net platform,and the ResNet-18 and ResNet-50 networks are implemented.Good results are obtained.The time to identify a picture is 7 ms and 15 ms respectively.And in terms of energy efficiency ratio is 3 times that of GPU,34 times that of CPU,with excellent energy efficiency ratio.The throughput has reached 532 GOPS and 546 GOPS,respectively,which is 2 times higher than the current state-of-the-art architecture.
Keywords/Search Tags:Sparse network, Deep Convolutional Neural Network, OpenCL, FPGA, heterogeneous computing
PDF Full Text Request
Related items