Font Size: a A A

Research On MobileNet V3 Based Low Bit Width Customizable Computing

Posted on:2022-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z D NiuFull Text:PDF
GTID:2518306572958989Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
Depending on complicated structure and vast parameters,Convolutional Neural Networks have shown excellent performance in applications such as image classification and target detection,however,this also poses huge challenges for CNN's hardware deployment at the edge where computing resources and storage resources are limited.The model compression method can reduce the computational complexity and model size of CNN,therefore provides a feasible solution for resource optimization of CNN's edge deployment.Parameter quantization is one of the effective methods of model compression,by using lower bit width to represent typical 32-bit floating point parameters,quantization enables CNN to perform inference with less computational resource consumption,therefore creates opportunities for the improvement of peak performance of CNN's inference.Based on the need of the research project of our research team,this paper uses MobileNet V3 to achieve image classification.Since most of the calculation of MobileNet V3 is concentrated in the convolutional layer,in order to achieve the goal of computing acceleration,this article focuses on the parameter quantization of the convolutional layer of MobileNet V3,and then research customized calculation structure of low bit width parameters for the quantified model,thus use FPGA to realize a computing system based on data with specific bit width.By using low bit width parameters we can achieve convolution calculations with higher parallelism compared with floating-point parameters in the same logical resource constraints,and then speed up the inference of the whole model,below are the specific research contents of this article.(1)During the inference process of MobileNet V3,there will be much off-chip data access due to the extraction of weights and intermediate calculation results,this can cause high latency,to solve this problem,we compress the model to reduce the data interaction between off-chip and on-chip,thus we can avoid the additional energy efficiency cosume caused by inefficient memory access.The experimental results show that the lightweight processing scheme in this article reduces the model size by more than 20 times,while the model's performance loss for image classification is less than3%.(2)Since each convolutional layer of the model has different role in extracting features,we use mixed precision quantization to achieve the combination of bit width that matches the feature extraction ability.This article proposes Absolute value mean aware Weight Quantization(AWQ),by determining the quantization bit width of the convolutional layer parameters according to their importance,we achieve a balance between low bit width calculation requirements and model's feature extraction capabilities.The experimental results show that after AWQ,the convolutional layers can be calculated completely with integer data,and the performance loss of the model for image classification is no more than 10%.(3)After mixed precision quantization,the parameters of each convolutional layer of the model have different scale,and the general convolution calculation unit cannot meet the calculation range of all layers,to solve this problem,we design the convolution unit with custom bit width.After the mixed precision quantization,we analyze the bit width of data that required for input and output of each convolution calculating step,and design a customizable computing unit,then we adopt algorithm optimization methods to make full use of FPGA logic resources,and achieve convolutional calculations with high parallelism and low latency.This article verifies the performance of FPGA accelerator by the actual remote sensing image processing project of our research team.The experimental results show that low bit width customizable FPGA accelerator achieves the computing performance of 8.11 GOPS,which is 48% higher than unquantized 32-bit floating point convolution.
Keywords/Search Tags:MobileNet V3, mixed precision quantization, Field Programmable Gate Array, low bit width customizable computing
PDF Full Text Request
Related items