Research On Convolutional Neural Network Accelerator For Mobile Terminals

Posted on:2022-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:C M Zeng

Full Text:PDF

GTID:2518306551970639

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Convolutional neural networks have become an excellent solution in a variety of scenarios.There exists needs on the deployment of convolutional neural network products on mobile terminal equipment,such as short video effects,smart drones,smart cameras,and wild herb recognition.In particular,scenes such as tunnels,caves,and military industries with no network,weak network or forbidden network also have hard demands.Convolutional neural network products need to store and calculate a large number of floating-point numbers,and require high demand on resources such as memory,computing power,and power consumption.Therefore,in order to deploy convolutional neural network products on mobile terminals,resource consumption needs to be optimized.The content of this paper is to study how to apply the academic research results of convolutional neural networks to industrial products in a lower cost and more efficient method on mobile terminal devices.The optimization work can be carried out from two levels: first,use pruning and quantification to compress the existing network model to reduce hardware resource requirements;second,design a dedicated neural network computing unit to accelerate the forward inference calculation of the network model in a targeted manner.In terms of compression network model,this paper considers reducing the error during quantization,and proposes an L2 Q model quantization method based on the idea of minimum error.This method will minimize the error caused by quantization,so that the quantized model parameter distribution is similar to the original distribution.In the design of AI computing power unit,this paper studies the parallelization and memory access characteristics of convolutional neural network operations,and designs and implements an FPGA-based convolution acceleration IP core on Xilinx heterogeneous Soc,and integrates it As the heterogeneous collaborative computing unit of the ARM processor,it jointly completes the forward inference operation of the convolutional neural network.The detailed work is as follows:1.In order to reduce the error caused by the quantized network model,this paper is based on the idea of minimizing the error,and first uses the L2 norm to represent the cumulative error before and after the model is quantized.Then the iterative method and KL divergence are used to determine the linear scaling factor when the cumulative error is the smallest.Finally,the set of floating-point number parameters in the network model is fixed-point to achieve the purpose of compressing the network model and reducing resource requirements.2.In order to improve the memory access efficiency and parallel computing efficiency during model calculations,this paper,based on tensor data in BC4HW4 format,proposes an efficient general convolution algorithm achievable by FPGA,and then implements a universal convolution acceleration IP core.3.In order to improve the calculation speed of the network model,this paper combines multiple general-purpose convolutional acceleration IP cores into an acceleration array,and forms a heterogeneous computing system with the ARM processor to jointly improve the data throughput of the entire accelerator.Compared with the existing FPGA accelerator,the single convolution acceleration IP core designed in this paper has a smaller data throughput,but a higher operating frequency.The multi-core array improves parallelization and the data throughput of the accelerator system.Four convolution acceleration IP cores were deployed on the ZCU102 experiment board,running at a frequency of 287 MHz,the accelerator throughput is 1096.86 GOP/s,the energy efficiency ratio is 54.06,and the vgg-16 acceleration ratio is 2.46.

Keywords/Search Tags:

model compression, accelerator, convolutional neural network, quantization

PDF Full Text Request

Related items

1	Compression Algorithm And Circuit Design Of Convolutional Neural Networks
2	Convolutional Neural Network Model Compression And Inference Acceleration Based On Look Up Table
3	Design And Implementation Of Convolutional Neural Network Accelerator Based On Affine Quantization
4	Research Of Model Compression Method Based On Quantized Convolutional Neural Network
5	Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network
6	Design And Verification Of DNN Compression Algorithm Based On Structure Pruning
7	Research On Application Of Neural Network Compression And Acceleration Based On Quantization
8	Study On Convolutional Neural Network Compression Methods Based On Pruning And Quantization
9	CNN Accelerator Design And Optimization Based On Approximate Calculation And Data Scheduling
10	Design And FPGA Verification Of CNN Accelerator Based On Weight Combination