Research On Optical Character Recognition Text Error Correction Based On Pretrained Model

Posted on:2024-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:Q K Xu

Full Text:PDF

GTID:2568307079459504

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The mass digitization of textual resources,such as books,newspaper articles,and cultural archives,has been underway for decades.This effort has made these valuable resources publicly available for research purposes.Optical character recognition（OCR）is among the most widely used techniques for converting printed documents into machine-readable formats.The process of converting document images into machine-readable text through optical character recognition provides a practical means of exploring large docu-ment collections using automated tools like text indexing,search,and machine translation.Although OCR engines perform well with modern text,their performance significantly degrades when dealing with historical materials.Furthermore,a significant portion of the text has been processed using outdated digital technologies.Deep learning has made re-markable progress in the field of OCR in recent years.However,OCR applications still often encounter identification errors.Misidentification not only makes the text difficult to read and understand but also diminishes its informational value.In certain fields,such as finance,misidentification can have significant financial implications.As a result,re-ducing the error rate of OCR tasks has become a major concern for both academia and industry.Existing OCR text correction solutions face two major challenges.Firstly,cur-rent OCR text correction solutions primarily focus on correcting OCR text errors in pure text images.When it comes to non-structured document images such as invoices or bank statements,these solutions struggle to utilize the semantic information of the documents themselves for error correction.Secondly,there is a lack of publicly available OCR cor-rection datasets,and the datasets that do exist contain a limited number of samples.This poses significant challenges for model training.To address the first issue,this thesis proposes an OCR text error correction method based on Layout LM,a pre-trained document understanding model.The method utilizes a multi-modal encoder that leverages a spatial-aware self-attention mechanism.This en-ables the model to deeply integrate text,visual,and layout information from image docu-ments,resulting in a fine-grained understanding of the documents.Regarding the second issue,inspired by the task of grammatical error correction,the paper designed a Bidirectional Inference Network with a Critic（Break It and Fix It,BIFI）architecture.Despite the limited data availability,the BIFI architecture achieved impressive results for grammatical error correction.Based on Layout Language Model version 2,a pre-training model for multimodal document understanding developed by Microsoft team,this thesis designed a layoutlm-critic as a discriminator for evaluating the alignment between OCR text,images,and bounding boxes.The BIFI architecture was trained on the basis of layoutlm-critic,and good results were achieved on both data sets SROIE and CORD(Under the unsupervised setting,the F_0.5score has been improved by9%;Under the supervised setting,the F_0.5score has been improved by 12%).

Keywords/Search Tags:

OCR, Text Error Correction, Pretained Model, Attention Mechanism, Multimodal, BIFI

PDF Full Text Request

Related items

1	Research On Chinese Text Error Correction Method Based On Multimodal Information
2	Research On Multimodal Sentiment Analysis Based On Joint Learning Of Image-text Features
3	Research On Error Correction Of News Text Based On Masked Language Model
4	Research On Computer Virus Signature Automatic Extraction Technique
5	Multimodal Sentiment Analysis For Text,Audio And Video
6	Chinese Spelling Error Correction Algorithm Incorporating Multimodal Semantic Features And Applications
7	Research Of Chinese Text Correction Based On Neural Machine Translation
8	A Model Of Pinyin Input Method With Error Correction Function Based On Neural Network
9	Design And Implementation Of Text Error Correction System Based On Text Extraction From Distributed Video Stream
10	Research On Multimodal Deep Learning Algorithm Based On Attention Mechanism