Font Size: a A A

Research On Burmese Text Detection And Recognition Methods In Complex Scenario

Posted on:2023-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:F H LiuFull Text:PDF
GTID:2555306797481734Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Burmese Optical Character Recognition(OCR)technology aims to obtain text information from images.The research on OCR of Burmese is of great significance to promote the research of image processing tasks in low-resource languages.OCR technology for Chinese and English is widely used in image retrieval,image translation and image understanding,however,there is still little research on Burmese OCR technology.Text detection and text recognition are two important parts of OCR,and their performance directly affects the accuracy of the whole system.Therefore,it is very important to improve the performance of Burmese text detection and recognition at the same time.There are still the following problems in using the existing text detection and recognition methods in Burmese:(1)There is no published dataset related to Burmese image text detection and recognition tasks;(2)The distribution of Burmese in images has the characteristics of complex character composition,multi-scale,multi-direction and background diversity,general text detection methods can’t accurately obtain the position of Burmese text.(3)Compared with common languages such as Chinese and English,Burmese presents different characteristics,Burmese has complex characters with nested combinations of multiple characters in a receptive field,which makes it difficult for the conventional recognition model to accurately recognize the text content。(4)In photographing,scanning text and natural scenes,the complex background of text and Various fonts.In view of the above problems,this thesis studies the text detection and recognition methods of Burmese in complex scenes,and constructs the text detection and recognition system of Burmese,the main research work is as follows:(1)Feature analysis of Burmese image and corpus constructionAim at that scarcity of image datasets for Burmese text detection and recognition,this thesis designs an image synthesis algorithm for various scenes,such as taking photos,screenshots and scanned text.And analyzes the Burmese image characteristics in this kind of scene,designing the implementation method according to its.In order to provide data support for follow-up research,It is used to expand the text recognition dataset,and mark a part of the Burmese text detection and recognition data set by manual marking method.Finally,3000 pieces of Burmese text detection data and 5 million pieces of Burmese text recognition data were collected.(2)Burmese text detection based on pre-training and instance segmentationIn order to solve the problem of low accuracy of text line detection caused by the diversity of text background and the particularity of Burmese characters,constructing a text segmentation model of Burmese image by using convolution neural network and feature pyramid network.Using The attention mechanism to capture the global information between pixels,and integrating the connection degree information between pixels into the network to improve the detection performance of Burmese text.In order to solve the problem of insufficient detection performance caused by the scarcity of Burmese image data,proposing a detection method based on pre-training model,which migrates the feature extraction network parameters trained on English data set to Burmese image text segmentation network,so as to enhance the ability to obtain the text features in the image.Finally,using it to solve the problems of false detection,missing detection and edge loss in the detection of Myanmar language text in the image caused by data scarcity.The experimental results show that the method has achieved good results in Burmese image text detection dataset,and its Accuracy,Recall and H-mean are 92.1%,88.1% and 90.1% respectively.(3)Burmese image text recognition method based on Multi-head AttentionAiming at the problem of insufficient representation ability of featuremap after Burmese image encoding,this thesis proposes a feature extraction backbone network based on fusion multi-layer semantic featuremaps.Based on the encoding-decoding framework,the deep convolution neural multi-layer feature maps are up-sampled and fused,and the network can obtain stronger representation ability in the feature extraction stage.Aiming at the problem that the recognition model in the image can not fully pay attention to the characteristics of combined characters,resulting in the loss of Myanmar characters,a visual attention model based on multi head attention is proposed to calculate the attention of the visual feature and eliminate the noise interference in the image,so as to obtain a more accurate semantic feature representation.Aiming at the problem that the decoder based on Recurrent Neural Network and attention is not effective in calculating the attention distribution of long text and the poor parallel computing ability,a decoding unit based on Multi-head-attention is designed,which is combined as a decoder to convert the feature sequence into Burmese characters.Experimental results show that the method can improve 3% compared with the baseline model in Burmese image text recognition dataset.(4)Burmese image text detection and recognition systemBased on the above related theoretical research,constructing a Burmese image text detection and recognition system,which is used to get Burmese image text coordinate information and Burmese text information.The system consists of two parts: Burmese text detection model and Burmese text recognition model.The system realizes the Burmese text detection and recognition in various scenes such as taking photos,screenshots and scanning text.
Keywords/Search Tags:Burmese, Text Detection, Instance Segmentation, Pre-training, Multi-Head Attention, Text Recognition
PDF Full Text Request
Related items