Text is one of the main sources of information in people’s daily life.Text detection and recognition has always been an important task in the field of computer vision.Scene text is difficult to process because of complex background,various text formats and uncertain imaging conditions.Based on deep learning,scene text detection and recognition algorithms have good robustness and generalization and are widely used in intelligent transportation system,identification of bills and certificates and other scenes.However,the complexity of structure and huge number of parameters make mainstream models difficult to deploy on the embedded platform,whose resources are limited.In this case,we research and design a lightweight scene text detection and recognition algorithm,which is deployed on the embedded platform with low cost,low power consumption and limited computing power.In this thesis,we research a lightweight scene text detection algorithm to detect multidirectional quadrilateral text.The feature extraction layer is used to obtain feature information of images.The feature fusion layer is used to fuse multi-scale features.The prediction layer classifies pixels as text or non-text,and text boxes are extracted from approximate binary map as post-processing.Comparative experiments are carried out on ICDAR2015 dataset,and performance of the model is further verified on MSRA-TD500 dataset.Compared with the baseline,our algorithm reduces model size to about 4%and increases frame rate by about 46%-69%,with small performance loss.In this thesis,we research a lightweight scene text recognition algorithm to recognize horizontal rectangular text.The feature extraction layer is used to obtain visual features of text.The sequence modeling layer is used to extract context features of text sequence.The prediction layer is based on CTC,using the maximum posterior probability to get text.Comparative experiments are applied on multiple datasets to verify model performance.Compared with the baseline,our algorithm reduces model size to about 4%and maintains inference speed,with small performance loss.In this thesis,we design an embedded scene text detection and recognition system.The models are trained on dataset for license plate recognition.OpenVINO model optimizer performs model transformation and model quantization.After transferring models to Raspberry Pi,OpenVINO inference engine drives NCS2 to realize hardware acceleration.In the algorithm,perspective transformation is used to correct skewed text,connecting text detection and text recognition modules designed in this thesis.After tests,the embedded scene text detection and recognition system is able to process 3 images per second and is robust to different shooting angles and distances,having high accuracy.It is suitable for license plate recognition at the entrance of parking lots. |