Font Size: a A A

Research On Encoder-Decoder Model For Complex Structure Text Recognition

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:C J WuFull Text:PDF
GTID:2518306323979469Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Text recognition technology aims to automatically recognize text content from im-ages or online track points.On the one hand,text recognition is closely related to cy-berspace security.This technology can automatically crack the verification code of the picture or track format,detect the terrorist information and illegal advertisements em-bedded in the picture,and identify the license plate number of traffic vehicles and ID number.On the other hand,text recognition technology can greatly improve people's work efficiency and promote social and economic development.For example,hand-writing input method,document scanning,camera translator and machine scoring and other applications have brought great convenience to people.The complex structure of text is one of the difficulties in text recognition tasks.Dif-ferent from ordinary text lines,in addition to basic text symbols,complex structure texts also contain a variety of spatial relationships.For example,in mathematical formulas,the radicand is inside the radical sign,and the radical exponent is located at the upper left of the radical sign.The existence of these spatial structures increases the diversity of text representation,but also greatly increases the difficulty of text recognition.This article mainly takes Chinese characters and formulas as examples to study the complex structure text recognition method based on encoder-decoder model.This dissertation starts from three perspectives:optimization of encoder,improve-ment of attention mechanism and reconstruction of decoder.This dissertation takes Chinese characters and formulas as examples to design a complex structure text recog-nition model with better performance and better robustness.1.This work combines the spatial transformation network in the encoder to im-prove the encoder's ability to extract features of distorted text,so that the model has better robustness when recognizing distored text.This work takes distorted Chinese character recognition as an example,and proposes a joint spatial and radical analysis network.The proposed model can rectify and recognize rotated or distorted Chinese characters well.2.This work uses the posterior attention mechanism to improve the soft attention mechanism and help the model obtain better alignment information between input and output.Taking online handwritten mathematical formulas as an example,we propose an encoder-decoder model using the stroke-based posterior attention mechanism.The posterior attention mechanism can effectively improve the alignment between model input and output,and therefore improve the recognition performance of the model.3.This work constructs a general tree decoder to decode any text that can be con-verted into a tree structure,and effectively improve the generalization of the model for complex structure text recognition.This work proposes a new type of tree decoder.The node classification module and branch prediction module in this model can more directly learn the spatial relationship between character nodes in complex structures.Through a large number of experiments on mathematical formulas and Chinese charac-ters,it is proved that the new tree decoder has better generalization ability and higher performance when facing complex structure text.
Keywords/Search Tags:Chinese character recognition, mathematical expression recognition, encoder-decoder, posterior attention mechanism, tree decoder
PDF Full Text Request
Related items