Design Of Accelerator For MobileNet Convolutional Neural Network Based On FPGA

Posted on:2021-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:J W Liao

Full Text:PDF

GTID:2518306200450284

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the in-depth research on deep learning,the convolutional neural network as its basic model has also been greatly developed and used in many fields.Convolutional neural network algorithms are usually implemented using software programming methods on the CPU or GPU,but for mobile devices with limited energy,such as mobile phones and drones,only using software to speed up convolutional neural network is unable to meet the growing speed and power requirements,therefore how to design a convolutional neural network accelerator with hardware has become a research focuses in the academic fields,and FPGAs have gradually become a good tool for convolutional neural network hardware acceleration with their highly parallel,configurable,flexible design features and excellent performance to power ratio.In this paper,a light weight convolutional neural network MobileNet is selected as the basic framework for research.MobileNet is a mobile-first computer vision model.It uses depthwise separable convolution instead of standard convolution.It has few parameters,low latency,and low power consumption.It is very suitable for mobile and Embedded devices.The design of this paper uses the Slim tool library in Tensorflow to complete the training of MobileNet,and finally uses Zynq xc7z045 as the hardware platform to complete the MobileNet hardware acceleration system in the form of CPU + FPGA.The key technologies adopted by the system are:(1)A special parallel acceleration scheme is designed for the standard convolution and depthwise separable convolution used by MobileNet,so that each convolution can be operated with maximum parallelism.(2)The method of hidden batch normalization is used to optimize the calculation process of each convolution module during hardware implementation,saving resources and speeding up.(3)Completed the hardware implementation of each convolutional layer in a modular manner,and designed a configurable depthwise convolution module as well as a pointwise convolution module,so that it can complete all 13 layers Depth separable convolution operation.(4)A timing control module is designed to open each convolution module in the form of a pipeline,which maximizes the utilization rate of the acceleration module.This ensures the efficiency of the entire acceleration system.(5)The overall structure of compressible size is designed to the hyperparameter width multiplier contained in MobileNet.The compression degree of the model can be selected according to actual needs,which improves the practicality of the system.The final experimental results show that at a 100 MHz clock frequency,the power consumption of the entire system is 2.49 W,and when the width multiplier is set to 1,0.75,0.5,and 0.25,the frame rates of the system are 1.40 fps,2.48 fps,5.57 fps,and 22.23 fps respectively.Compared with the implementation of MobileNet convolutional neural network on different hardware platforms,this design is 8.24 times faster than the i5-5200 U CPU,and the speed power ratio is 6.96 times that of the NVIDIA GTX970 GPU.Compared with the results of implementing convolutional neural networks on the Zynq7z045 hardware platform in recent years,this design takes up relatively few resources and has certain advantages in terms of speed power ratio.

Keywords/Search Tags:

MobileNet, FPGA, Depthwise Separable Convolution, Configurable, Acceleration

PDF Full Text Request

Related items

1	Design Of Energy-Efficient Acceleration For MobileNetV1
2	Research On CNN Network Acceleration For Image Classification Based On FPGA
3	Accelerator Design And Research Of Depthwise Separable Convolutional Neural Network Based On FPGA
4	The Design And FPGA Verification Of A CNN Accelerator With Depthwise Separable Convolutions
5	Research On MobileNet Acceleration Methods Based On HLS
6	Depthwise Separable Convolutional Neural Network Structure Optimization For Embedded Systems
7	Research On Target Detection Based On Improved Convolutional Neural Network
8	Research Of Computation Efficient Algorithm For Deep Learning
9	Research On Convolutional Network And Its Variants In Keyword Spotting
10	Real-Time Multi-Persons Pose Estimation In Complex Scenes