Font Size: a A A

Design And Implementation Of High Energy Efficiency Convolutional Neural Network Accelerator For Mobile Terminal

Posted on:2023-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ShiFull Text:PDF
GTID:2558307154974449Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Artificial Intelligence,modern convolutional neural network has achieved great success in the fields of image classification,image segmentation,object detection and so on.With the higher and higher performance requirements,the complex neural network model continues to develop to a deeper network structure,which brings a significant increase in the amount of calculation and storage.In some real application scenarios,such as mobile or embedded devices,such a large and complex model is difficult to apply.Firstly,the model is too large,and it faces the problem of insufficient memory when deployed on mobile devices;Secondly,some real-time scenarios require low running delay,fast response speed and high classification accuracy.Therefore,it is very important to study the energy-efficient and lightweight CNN model and accelerate the model with hardware.Aiming at the problem of insufficient memory when complex neural network model is deployed on mobile devices,this dissertation proposes two neural network compression algorithms to simplify the original Mobile Net-SSD model.Firstly,the performance improvement bottleneck of the whole network is analyzed,and it is found that the amount of parameters and execution time of point-wise layer account for the largest proportion in the network model.The pruning algorithm is a convolution filter pruning specially proposed for the problem of point-wise convolution parameter redundancy.The convolution filters in each layer of point-wise convolution are sorted according to the importance,and the convolution filters that are not important in the process of feature extraction are removed.After pruning,the accuracy loss in the pruning process is restored by retraining.Secondly,this dissertation proposes an INT16 quantization strategy,which uniformly converts the trained floating-point number parameters into fixed-point numbers to participate in the offline prediction of the model.On the one hand,it can further compress the network model,on the other hand,it can reduce the participation of floating-point computing units and effectively improve the prediction speed.Aiming at the problem that the object detection algorithm can not maintain high performance and high precision when deployed on mobile devices with limited area and power consumption,a design of Mobile Net-SSD object detection hardware accelerator based on software and hardware cooperation method is proposed for Field Programmable Gate Array(FPGA)platform.A configurable convolution computing acceleration array is designed to realize multi-granularity parallelism of different scale network layers by cyclic tiling.On this basis,a linebuffer optimization mechanism for input buffer is further designed,which combines Direct Memory Access(DMA)and data stream interface to transfer data to solve the bottleneck of transmission delay.Experiments show that the performance and power consumption of the proposed object detection system are 89 times and 7 times higher than that of CPU and GPU,respectively.Compared with the object detection system proposed in the previous work,it has higher accuracy and better performance.
Keywords/Search Tags:Object detection, Convolutional neural network, FPGA, Software-hardware co-design
PDF Full Text Request
Related items