Font Size: a A A

Design And Implementation Of Lightweight Convolutional Neural Network Accelerator On SoPC

Posted on:2021-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y LeiFull Text:PDF
GTID:2518306476950459Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of artificial intelligence,convolutional neural network(CNN)is widely adopted in the computer vision field.CNN has shown its superiority in some tasks,such as image classification and object detection.However,its high computational complexity makes it hard to be directly deployed on edge/mobile devices whose performance and energy consumption are extremely limited.The lightweight network or the dedicated circuit are two solutions to make it possible to run CNN on such devices.Following these conceptions,a lightweight CNN accelerator on So PC(System on a Programmable Chip)is proposed in this thesis.The inference stage of Mobile Net V2 runs on this platform.First,the parallelism in Mobile Net V2 is analyzed for unrolling the loops and parallelizing the calculation.The different parallelism strategies are applied regarding the convolution types.The runtime of different parallel factor is roughly estimated by a quantitative model.Then the parallel factor is decided.Next,the accelerator system on So PC is implemented,which could be divided into hardware and software parts.The main function of hardware is to calculate the convolution layers.It consists of control logic,convolution module,DMA,parameter cache,feature map buffer,and CSR.In the convolution module,different parallelism is achieved by the unified parts with the configurable data path.The batch normalization is optimized to avoid division and square root operation.The DMA and parameter cache with pre-fetch function are introduced to reduce the latency of reading parameters from DRAM.Different kinds of on-chip buffer are used to store the high-throughput intermediate data.The software consists of the middleware on PC and the program in the PS side.The middleware extracts the parameters and converts them to fixed-point numbers,then packs them to a readable format for DMA.The program in PS is designed to control the PL,to calculate the FC layer and to classify.The FC layer is optimized to run on the NEON,rather than FPU,for parallelism.Moreover,the convolution layer and the FC layer are pipelined for multi-image classification so that the PL and PS could work simultaneously to bring a higher throughput.Finally,a test platform based on PC and ZC702 is built to test the accelerator system.The accelerator works on 150 MHz,with Mobile Net V2 0.5 96 model.The data is quantized to signed16-bit fixed-point number.169 DSPs are utilized in the system.The system could classify an image in 2.2ms,with the performance of 16.4GOPS.
Keywords/Search Tags:Convolutional Neural Network, Deep Learning, MobileNetV2, Hardware Acceleration, FPGA
PDF Full Text Request
Related items