Accelerator Design And Research Of Depthwise Separable Convolutional Neural Network Based On FPGA

Posted on:2022-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Zhan

Full Text:PDF

GTID:2518306728466144

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous increment of researchers' requirements for the accuracy,convolutional neural network is also developing in the direction of deeper layers and more complex architecture,which adds to computational burden and slows down the computational speed in deployment,and at the same time faces the limitations of hardware resources and energy consumption.Thus many lightweight convolution neural network and the corresponding hardware accelerators based on FPGA are proposed in recent years.However,most accelerators adopt two-dimensional parallel computing mode in the past.For some newly proposed convolution types,such as Depthwise Separable Convolution,there exists the problem of low utilization rate of hardware computing units.At the algorithms level,many designs are still accelerating relatively simple networks such as VGG and Mobile Net,while there are few studies on hardware optimization of residual structure adopted by the current mainstream networks or newly proposed operations such as Shuffle.To solve the above problems,this paper takes Shuffle Netv2 model which adopts advanced depthwise separable convolutional neural network as the target network,proposes an optimization scheme with higher utilization of hardware computing resources for the realization of depthwise convolution,and optimizes pooling and Shuffle operation to be more hardware friendly.The main work of this paper is as follows:1.Firstly,the parallelism feature of convolutional computation is introduced.In consideration of the network characteristics of Shuffle Netv2,the array architecture of convolutional computation adopts two switching modes: standard convolution mode and depthwise convolution mode.Under this architecture,standard convolution mode has input channel and output channel parallelism,and the depthwise convolution calculation can be the sliding window,the different channels and different input image three dimensions in parallel.The two parallel computing modes reduce the demand for on-chip cache and bandwidth access,at the same time the convolution calculation unit has a high utilization rate.After selecting the appropriate degree of parallelism for the network structure,the hardware design of the convolution module is executed,and the function simulation of the two calculation modes is carried out to ensure the feasibility of the design.2.Based on the optimization architecture of the above convolutional accelerator,a hardware-friendly implementation method for pooling and shuffle operation is proposed.Pooling operations are executed separately in two dimensions,reducing the demand for cache.The Shuffle adopts smaller channel groups to reduce the frequency of memory access.And the data path is optimized accordingly.In order to reduce the data memory overhead,a Double Buffer Ping-Pong structure is adopted to improve the data throughput.Low-precision fixed-point numbers are used for weights and feature maps respectively to reduce the operation time of multiplier.And the use of Data Interface Module makes it easier for the accelerator to control the external memory and receive data.3.At last,the behavior and timing simulation is executed for the proposed deep convolutional accelerator,and the classification test of CIFAR10 datasets is carried out with Shuffle Netv2 network.For 10,000 images,the classification accuracy reached94.1%.Compared with the benchmark architecture,the optimized architecture in this paper has a higher utilization rate of computing array,improved classification performance by 23.8%,and achieved the processing speed of 760 frames per second.Finally,compared with other FPGA accelerators using the same datasets,the results show that our design has relatively better image processing speed and model accuracy.

Keywords/Search Tags:

FPGA, Convolutional Neural Network Accelerator, Depthwise Separable Convolution, Parallel Architecture

PDF Full Text Request

Related items

1	The Design And FPGA Verification Of A CNN Accelerator With Depthwise Separable Convolutions
2	Design Of Accelerator For MobileNet Convolutional Neural Network Based On FPGA
3	Research On CNN Network Acceleration For Image Classification Based On FPGA
4	Depthwise Separable Convolutional Neural Network Structure Optimization For Embedded Systems
5	Research On Target Detection Based On Improved Convolutional Neural Network
6	Design And Optimization Of Convolution Array Accelerator Based On FPGA
7	Face Recognition Algorithm Design And FPGA Verification Based On Deep Seperable Convolution
8	Slimming Convolutional Neural Networks With Depthwise Separable Convolutions
9	Parallel Accelerator Design For Convolutional Neural Networks Based On FPGA
10	Research On Target Detection Algorithm Based On Convolutional Neural Network