Study And Design Of Special Convolution Neural Network Inference Accelerator Based On Face Detection Yolo Algorithm

Posted on:2019-08-22

Degree:Master

Type:Thesis

Country:China

Candidate:C Luo

Full Text:PDF

GTID:2428330566987577

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Recently,with the rapid development of artificial intelligence,many deep learning algorithm models have emerged.Meanwhile,convolutional neural networks have made great progress in machine vision.Most of deep learning algorithms use the cloud server to operate because of the powerful computing capabilities o.However,in view of the requirements of real-time,security,and offline for terminal devices,the cloud's deep learning network should operate in the local computer.Therefore,the study of dedicated convolutional neural network reasoning accelerator,which is designed for specific application and algorithm,is important and useful.It has become a hot topic in academia and industry.This thesis focuses on the hardware structure,computing performance,power consumption,on-chip cache,hardware resource utilization,etc.,studying and designing of dedicated convolutional neural network reasoning accelerator based on the face detection YOLO algorithm,the main work is as follows:(1)Study and analyze YOLO algorithm for face detection,adjust training parameters and neural network structure,train convolutional neural network model conforming to face detection,and obtain weight and inference parameters after training.According to the analysis of face detection algorithm model,a system implementation frame is proposed.(2)Focusing on floating-point data processing with YOLO algorithm for face detection,a convolutional neural network inference accelerator combining data reuse and distributed on-chip memory is proposed to accelerate the inference process of convolutional neural network while reducing the external data bandwidth requirements of the module.The corresponding instruction set and working mode are set,and various parameters of different network layers of the convolutional neural network are implemented through the instructions to perform different arithmetic operations.(3)Focusing on binarized data processing with YOLO algorithm for face detection,a convolutional neural network inference accelerator combining binarization and distributed on-chip memory is proposed to reduce the weight,inference parameters,and input data,and speed up the calculation process.(4)The above convolutional neural network inference accelerator was designed and simulated by using Xilinx FPGA.The simulation results show that the floating-point convolutional neural network inference accelerator has a peak operating speed of 3.188GMAC/s and a power consumption of 2.519 W at 100 MHz clock,which is 8.46 times faster than the general-purpose CPU,while the power consumption is only 3.88%.Under the 100 MHz clock,the amount of data that the binary inference accelerator needs to process is 1/32 of the floating-point type,which avoids a large number of floating-point convolution multiply-accumulate operations,thus accelerates the inference process,but the detection accuracy of binary YOLO algorithm needs to be improved further in the future.

Keywords/Search Tags:

Inference Accelerator, Special Type, Convolution Neural Network, YOLO algorithm

PDF Full Text Request

Related items

1	Research And Implementation Of End-to-Side Inference Accelerator For Convolutional Neural Network Based On ZYNQ
2	Design And Optimization Of Convolution Array Accelerator Based On FPGA
3	Convolutional Neural Network Model Compression And Inference Acceleration Based On Look Up Table
4	Research On The Acceleration Of Tiny-yolo Convolution Neural Network Based On HLS
5	Research And Design Of YOLO V2 Neural Network Accelerator Based On FPGA
6	Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator
7	Research On The Convolution Neural Network Accelerator For Image Recognition
8	Design Of Neural Network Accelerator For Portable Applications
9	Convolution Neural Network Accelerator For General DSP
10	Design And Implementation Of A Target Detection Algorithm For Aerial Images Based On Deep Learning