Font Size: a A A

Research On Lightweight Convolutional Neural Network Accelerator

Posted on:2020-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z F YuFull Text:PDF
GTID:2428330602450758Subject:Engineering
Abstract/Summary:PDF Full Text Request
Compared with the traditional image processing methods,the efficiency and accuracy of deep neural networks have been significantly improved.Neural networks can play a significant role in many applications.However,convolutional neural networks are very computationally intensive and general-purpose processors such as CPUs are difficult to accelerate it and have low efficiency and delays in applications.While the high-end GPU is efficient in computing convolutional neural networks,Its high power consumption makes it unsuitable for mobile platforms.Now,mobile platforms such as autonomous driving and robots have high requirements for low latency and low power consumption,so CPUs and GPUs are not the best choice for implementing these applications.The FPGA integrates a large amount of DSP resources.Because it does not require instructions,it can fully utilize the floating-point computing power,which makes FPGAs have great advantages in accelerating data-intensive computing.Compared with high-end GPUs,FPGAs consume less power.However,large networks such as VGG are difficult to implement on resourceconstrained FPGAs due to the large amount of parameters and computation,while the lightweight network Mobile Net V2 uses deep separable convolution instead of standard convolution to significantly reduce the amount of computation and parameters.Due to the structure of Moile Netv2,it is easier to implement on FPGA,let us see the possibility of complex convolutional neural network implementation on the mobile platform.Therefore,the computational acceleration of the lightweight convolutional neural network model of Mobile Net V2 is of great significance for the implementation of complex convolutional neural networks on the mobile side.Under the above background,the paper designs and implements a lightweight convolutional neural network accelerator to improve the speed of network's forward transmission by optimizing structure,pipeline design and improving network parallelism.The main work of the thesis includes:Using the cifar-10 data set to complete the weight training of the Mobile Net V2 network on the keras framework,the highest accuracy weight is saved for the pre-transmission network.We analyzed the data transfer mode and the required storage and bandwidth of various parallel modes.And finally,we select the parallel of the 9 numbers inside the convolution window of the feature map and the parallel of the 8 feature maps.Combined with the resources and features of the FPGA,the design and implementation of the general convolution module was completed.The structure was optimized for data transfer for deep separable convolution.Tests have shown that achieving a depth separable structure can increase the speed by a factor of about 6.05 compared to a traditional convolution structure.We design a high-performance convolutional neural network accelerator structure,which integrates an integrated convolutional arithmetic unit array and an alternating buffer,and uses a data compression method.This structure can perform convolution operations in batches,reducing the delay caused by parameter loading,and saving storage and significantly improving efficiency.Based on Altera experiment platform,the acceleration of Mobile Net V2 pre-transmission network was realized and the resource usage and design performance were analyzed.Mobile Net V2 achieves 96.61% image classification accuracy on DE1-So C,and it takes 5.2ms to calculate a single image.Compared to the Cortex-A9's 98.52% accuracy and 18.3ms per image,the accelerator has increased the speed by about 3.52 times with only 1.91% loss of accuracy.
Keywords/Search Tags:accelerator, FPGA, convolutional neural network, depth-wise
PDF Full Text Request
Related items