Font Size: a A A

Research On Automatic Proofreading Method Of OCR Recognition Results

Posted on:2020-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y N HaoFull Text:PDF
GTID:2428330590956615Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of intelligent prosecutorial work,the electronic paper file is an important link of inspection informationization,but it is inevitable to produce a large number of identification errors in the electronic process of paper files which are restricted by automatic identification technology,which can not meet the requirements of procuratorial organs for the accuracy of electronic files.Therefore,the computer-aided proofreading of digitalized text is practical of research significance.This paper studies the automatic proofreading technology of the text after OCR recognition,first analyzes the cause of the error in the electronic text of OCR technology and the type of error contained therein,which of the automatic proofreading technique on the basis is studied for "non-multi-word error" and "true multi-word error" in the error of Word level in text.Two different proofing methods are proposed when proofreading "non-multi-word errors" and "true multiple word errors" :(1)Automatic proofreading method based on joint proofreading: This method divided the proofreading task into two parts of automatic error checking and automatic correction.In the automatic error checking part,in addition to recording the non-login word as a suspected mistake,combined the language model with the window movement algorithm to calculate the correlation degree of the word strings,the confidence value was calculated according to its correlation degree score,and the suspected error judgment rules are established according to the degree of correlation.the corpus and the fuzzy matching were combined to select the set of candidate words in the automatic error-correction part.Then the confidence value of the candidate word set and the similarity between the candidate word and initial word were combined to generate the best suggested word and complete the task of automatic error correction.(2)Automatic proofreading method based on attention mechanism and endto-end sequence model: This method used the bidirectional Gated Recurrent Unity neural network as the sequence coding model,used the Gated Recurrent Unity as the sequence decoding model,and the attention mechanism was adopted to construct a model for the issues of text automatic proofreading.The end-to-end sequence model was constructed by using the Gated Recurrent Unity neural network with the ability of memory sequence information and the attention mechanism with the ability of key information which is related to different points of concern.The model was applied to completing the proofreading of the text sentences.Finally,the automatic proofreading method based on joint proofreading is verified by experiments,and the accuracy rates of debugging and error correction were respectively 81.3% and 79%.A comparative experiment is made on the automatic proofreading method based on attention mechanism and end-to-end sequence model,and the experimental results show that the introduction of attention mechanism improves the accuracy of automatic text proofreading and verifies the validity of the model.
Keywords/Search Tags:Proofreading of Chinese text, Language model, Fuzzy matching, End-to-end sequence, Attention mechanism
PDF Full Text Request
Related items