As an important foundation technology for the practical deployment of Neural Networks(NNs),NN compression algorithm mainly includes quantization,distillation,and pruning.Among them,the quantization algorithm is an effective way to compress the model size,and reduce the system power consumption and delay.The thesis focuses on the NN quantization algorithm,using Field Programmable Gate Array(FPGA)platform for algorithm deployment and system performance verification.First,the theoretical basis of the NN quantization algorithm,including the common symbols and concepts used in this field,are introduced.Based on this,considering the deployment scenario of the edge-side FPGA,the thesis mainly discusses and solves the problems of data distribution imbalance in Post Training Quantization(PTQ)and software-hardware alignment in Quantization Aware Training(QAT).The contributions of the thesis are as follows:(1)For the PTQ scenario,two algorithms,namely cross-layer weight equalization and intra-layer activation-weight equalization,are proposed to solve the problems of data distribution imbalance in weight and activation,respectively.(2)For the QAT scenario,an algorithm that re-estimates Batch Normalization layer statistics is proposed to improve the accuracy of the model.Besides,benefiting from an integer-only QAT algorithm and an overflow-aware QAT algorithm,the thesis simulates the inter-layer requantization and accumulator’s overflow in training,which minimizes the error between software training and hardware deployment to the greatest extent.(3)The thesis further proposes a NN quantization and implementation platform which realizes the complete conversion from the trained NN in deep learning framework to FPGA deployment,providing a tool to verify the effectiveness of the algorithms.For the single-object detection task of DaJiang Innovations’drones,the thesis deploys the aforementioned algorithms on an Ultra96V2 board in two scenarios of PTQ and QAT,and evaluates the effectiveness of the proposed methods.Experimental results demonstrate that the proposed algorithms can significantly reduce the model size and achieve an Average Intersection Over Union(AIOU)improvement in practical deployment.With a specific quantization configuration,the model parameter volume is compressed significantly.For the PTQ scenario,by equalizing the data range,the AIOU can be increased from 69.29%to 71.56%.For the QAT scenario,the AIOU can be increased from 72.35%to 73.54%through software-hardware alignment,almost without any loss compared to the original floating-point model. |