Research On Key Technologies Of Real-time OCR For AIoT Chips

Posted on:2023-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Gan

Full Text:PDF

GTID:2568307025972559

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

OCR has a wide range of application scenarios and a great number of related commercial products,but there is a lack of publicized research on how deep learning-based OCR models can achieve real-time inference in edge devices.Deep learning-based OCR models usually consist of CNN and RNN/LSTM,which are computationally intensive and have many weight parameters,resulting in a large amount of computational resources required to reason about OCR models on edge devices to achieve the performance requirements.generalpurpose processors such as CPUs and GPUs cannot meet both processing speed and power requirements,and are very costly.With the popularity of deep learning,neural processing units NPUs are becoming common in many embedded and edge devices with high throughput and superior computational power to handle the matrix operations involved in neural networks.In this study,the OCR model is compressed using a compression algorithm to reduce its network redundancy and memory size,and then the compressed O CR model is deployed on the NPU,and an acceleration strategy is further designed based on the network structure of the OCR model to meet the real-time inference requirements of the OCR model on the AIoT chip.The specific work is as follows.(1)This dissertation proposes block-based fine-grained weight pruning,which solves the problems of unbalanced workload and low pruning rate brought by weight pruning in edge devices,and uses a dynamic progressive pruning method to dynamically update the pruning thre shold during the model training process,so as to recover the accuracy of the original model.(2)The compression effect of a single compression algorithm is limited.This dissertation combines two compression algorithms,pruning and quantization,and adopts the KL divergence model-based quantization method to solve the problems of floating point operation difficulties on hardware and the error caused by the unbalanced distribution of parameters generating extreme values on quantization.(3)Further design the acceleration strategy based on the network structure of the OCR model.In response to the inability of the neural network gas pedal to perform operations on the LSTM layer,a method is proposed to deploy the LSTM layer energetically and transform the L STM into a conventional layer that can be processed by the gas pedal.Since it is quite difficult to implement exact nonlinear functions on hardware,a combination of segmented polynomial fitting and LUT lookup tables is used to obtain the results of activ ation function operations and improve the inference speed of the model on hardware devices.The experimental results show that the 32-bit floating-point parameters of the OCR model can be quantized to 8 bits by pruning-quantization compression,and the pruning rate reaches 78%.Comparing the combined model accuracy,model size and model inference time,the model accuracy decreases by less than3%,the text detection CTPN model size is compressed from 67.6MB to 12.52 MB,and the NPU achieves a 36.35 x and 7 x speedup in latency compared to the implementations on CPU and GPU,respectively;the text recognition CRNN model size is compressed from 15.87 MB to 3.13 MB,while the CRNN model achieves a 28.87 x and 6.1x speedup in latency on NPU compared to CPU and GPU implementations,respectively.

Keywords/Search Tags:

AIoT, Model quantization, Network pruning, Neural network acceleration

PDF Full Text Request

Related items

1	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
2	Research On Convolutional Neural Network Acceleration
3	Research On Compression And Acceleration For Deep Convolutional Network Model
4	Research On Model Compression And Acceleration For Deep Neural Network
5	Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning
6	The Acceleration And Compression Of Convolutional Neural Networks
7	Design Of Heterogeneous Neural Network Accelerator Based On Pruning And Sparsity Optimization
8	Similarity-Based Approach To Neural Network Pruning
9	Research On Acceleration Method Of Convolution Neural Network Model
10	Research On Application Of Neural Network Compression And Acceleration Based On Quantization