Design And Implementation Of High Energy Efficiency Convolutional Neural Network Accelerator For Mobile Terminal

Posted on:2023-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Shi

Full Text:PDF

GTID:2558307154974449

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Artificial Intelligence,modern convolutional neural network has achieved great success in the fields of image classification,image segmentation,object detection and so on.With the higher and higher performance requirements,the complex neural network model continues to develop to a deeper network structure,which brings a significant increase in the amount of calculation and storage.In some real application scenarios,such as mobile or embedded devices,such a large and complex model is difficult to apply.Firstly,the model is too large,and it faces the problem of insufficient memory when deployed on mobile devices;Secondly,some real-time scenarios require low running delay,fast response speed and high classification accuracy.Therefore,it is very important to study the energy-efficient and lightweight CNN model and accelerate the model with hardware.Aiming at the problem of insufficient memory when complex neural network model is deployed on mobile devices,this dissertation proposes two neural network compression algorithms to simplify the original Mobile Net-SSD model.Firstly,the performance improvement bottleneck of the whole network is analyzed,and it is found that the amount of parameters and execution time of point-wise layer account for the largest proportion in the network model.The pruning algorithm is a convolution filter pruning specially proposed for the problem of point-wise convolution parameter redundancy.The convolution filters in each layer of point-wise convolution are sorted according to the importance,and the convolution filters that are not important in the process of feature extraction are removed.After pruning,the accuracy loss in the pruning process is restored by retraining.Secondly,this dissertation proposes an INT16 quantization strategy,which uniformly converts the trained floating-point number parameters into fixed-point numbers to participate in the offline prediction of the model.On the one hand,it can further compress the network model,on the other hand,it can reduce the participation of floating-point computing units and effectively improve the prediction speed.Aiming at the problem that the object detection algorithm can not maintain high performance and high precision when deployed on mobile devices with limited area and power consumption,a design of Mobile Net-SSD object detection hardware accelerator based on software and hardware cooperation method is proposed for Field Programmable Gate Array(FPGA)platform.A configurable convolution computing acceleration array is designed to realize multi-granularity parallelism of different scale network layers by cyclic tiling.On this basis,a linebuffer optimization mechanism for input buffer is further designed,which combines Direct Memory Access(DMA)and data stream interface to transfer data to solve the bottleneck of transmission delay.Experiments show that the performance and power consumption of the proposed object detection system are 89 times and 7 times higher than that of CPU and GPU,respectively.Compared with the object detection system proposed in the previous work,it has higher accuracy and better performance.

Keywords/Search Tags:

Object detection, Convolutional neural network, FPGA, Software-hardware co-design

PDF Full Text Request

Related items

1	Co-design And Implementation Of Hardware/Software Of Convolutional Neural Network Based On FPGA
2	Research On Lightweight Convolutional Neural Network Algorithm And Hardware Collaborative Acceleration Technology
3	Software And Hardware Co-design Of Convolutional Neural Network Accelerator Based On FPGA
4	Research On Edge Oriented FPGA Software Hardware Collaborative Convolutional Network Acceleration
5	Design Of YOLOv3-Tiny Algorithm Based On FPGA
6	Design And Implementation Of A Configurable Convolutional Neural Network Accelerator Based On ZYNQ
7	Research On Image Recognition Acceleration Method Based On FPGA Hardware And Software Collaboration
8	Design And Application Of Convolutional Neural Network Accelerator For Image Classification Based On ZYNQ
9	Research On Object Rapid Detection Method Of Internet Of Thing Terminal Based On Convolutional Neural Network
10	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA