Font Size: a A A

Scene Chinese Text Recognition Based On Dual Attention Mechanism

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:2428330611465318Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As an important carrier of human information communication,the text contains very rich semantic information.It is of great significance for the recognition and understanding of the text in the image.With the rapid development of artificial intelligence technology,scene text recognition technology that based on deep learning is changing rapidly.However,the current methods have some problems,such as insufficient recognition accuracy and insufficient recognition ability for deformed characters.Therefore,those methods are still far from practical applicationThe existing scene text recognition algorithms have the following problems:(1)The feature extraction network cannot adapt well to the input image of the scene text.(2)The existing technology cannot directly use explicit language models to fully mine semantic information.(3)The recognition algorithm based on the one-dimensional encoding and decoding network cannot directly process two-dimensional images.Therefore,this thesis proposes a scene Chinese text recognition method that based on dual attention mechanism and binary associated semantic information.Including feature extraction module,codec module,binary associated semantic information module,dual attention network module.The specific contributions are as follows1.For the current feature extraction network,there is no way to handle small-sized text input images well.A multi-scale fusion residual network is proposed to effectively improve the feature extraction capability.Based on ResNet using jumpers to make residuals on the input and output of the convolutional layer,the input and output feature maps are simultaneously channel-spliced to perform feature map information fusion of different scales.Since the number of jumpers does not increase,it is not easy to cause overfitting2.In order to effectively use the language model,this thesis draws on the factorization machine algorithm and proposes a binary associated semantic model.It can explicitly learn sequential semantic information and step-by-step semantic information.When predicting a character in the sequence,the previous predicted character vector is used to perform two-two point multiplication.And the binary associated semantic information can be obtained.Later,the obtained information is utilized to guide the generation of the currently predicted character Compared with LSTM which can only implicitly learn sequential information,it can mine semantic information better3.Aiming at the recognition of irregular characters,a scene text recognition model based on dual attention mechanism is proposed.It can process two-dimensional image features and one-dimensional sequence features at the same time to deal with the recognition of some distorted texts.The dual attention mechanism uses sequence attention weights to weight sequence features.Then,one-dimensional sequence information is obtained through the encoder.Meanwhile,it uses image attention weights to directly weight two-dimensional image features to obtain two-dimensional image information.Finally,it combines sequence information and image information for identification.It has played a good role in supplementing the information when compared to the original sequence-based single attention mechanismThis thesis uses the Chinese text scene database MTWI and Baidu OCR for testing.The experimental results show that the scene Chinese text recognition model proposed in this thesis has a 2%and 6%improvement respectively on the two data sets,when compared to the basic network.Compared with the industry-leading method SAR on the same data set,the improvement is 0.7%and 2.9%,respectively,which verifies the effectiveness of the proposed method.
Keywords/Search Tags:Scene Chinese text recognition, Multi-scale fusion residual network, Binary associated semantic information, Dual attention mechanism
PDF Full Text Request
Related items