Font Size: a A A

Design Of Accelerator For MobileNet Convolutional Neural Network Based On FPGA

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:J W LiaoFull Text:PDF
GTID:2518306200450284Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the in-depth research on deep learning,the convolutional neural network as its basic model has also been greatly developed and used in many fields.Convolutional neural network algorithms are usually implemented using software programming methods on the CPU or GPU,but for mobile devices with limited energy,such as mobile phones and drones,only using software to speed up convolutional neural network is unable to meet the growing speed and power requirements,therefore how to design a convolutional neural network accelerator with hardware has become a research focuses in the academic fields,and FPGAs have gradually become a good tool for convolutional neural network hardware acceleration with their highly parallel,configurable,flexible design features and excellent performance to power ratio.In this paper,a light weight convolutional neural network MobileNet is selected as the basic framework for research.MobileNet is a mobile-first computer vision model.It uses depthwise separable convolution instead of standard convolution.It has few parameters,low latency,and low power consumption.It is very suitable for mobile and Embedded devices.The design of this paper uses the Slim tool library in Tensorflow to complete the training of MobileNet,and finally uses Zynq xc7z045 as the hardware platform to complete the MobileNet hardware acceleration system in the form of CPU + FPGA.The key technologies adopted by the system are:(1)A special parallel acceleration scheme is designed for the standard convolution and depthwise separable convolution used by MobileNet,so that each convolution can be operated with maximum parallelism.(2)The method of hidden batch normalization is used to optimize the calculation process of each convolution module during hardware implementation,saving resources and speeding up.(3)Completed the hardware implementation of each convolutional layer in a modular manner,and designed a configurable depthwise convolution module as well as a pointwise convolution module,so that it can complete all 13 layers Depth separable convolution operation.(4)A timing control module is designed to open each convolution module in the form of a pipeline,which maximizes the utilization rate of the acceleration module.This ensures the efficiency of the entire acceleration system.(5)The overall structure of compressible size is designed to the hyperparameter width multiplier contained in MobileNet.The compression degree of the model can be selected according to actual needs,which improves the practicality of the system.The final experimental results show that at a 100 MHz clock frequency,the power consumption of the entire system is 2.49 W,and when the width multiplier is set to 1,0.75,0.5,and 0.25,the frame rates of the system are 1.40 fps,2.48 fps,5.57 fps,and 22.23 fps respectively.Compared with the implementation of MobileNet convolutional neural network on different hardware platforms,this design is 8.24 times faster than the i5-5200 U CPU,and the speed power ratio is 6.96 times that of the NVIDIA GTX970 GPU.Compared with the results of implementing convolutional neural networks on the Zynq7z045 hardware platform in recent years,this design takes up relatively few resources and has certain advantages in terms of speed power ratio.
Keywords/Search Tags:MobileNet, FPGA, Depthwise Separable Convolution, Configurable, Acceleration
PDF Full Text Request
Related items