Font Size: a A A

Reasearch And Application Of OCR Conversion Text Error Correction Method Based On Knowledge Graph

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:X W ZhangFull Text:PDF
GTID:2428330623957647Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The acquisition of large-scale data is the basis of the research on big data and artificial intelligence.The text data mainly comes from regular text resources and image files containing rich text information.The extraction of a large amount of available text information in image files is a prerequisite for data acquisition.At present,most of the text information extraction for image files is based on OCR technology.However,OCR is a technology based on image recognition.There are often recognition errors in the process of image file conversion.In order to improve the availability of text after image file conversion,avoid Cumbersome manual correction.This dissertation proposes a method for domain text error correction for OCR recognition,and researches from the following aspects:(1)A method for error correction of OCR converted text is proposed.By improving the TF-IDF algorithm,combined with the cosine similarity calculation,a screening algorithm for weight generation and ranking of multiple candidate word lists is designed.In combination with the error rule inference model and feature crossover algorithm designed in this dissertation,a design for OCR converts text for error judgment and correction methods,and inference and verification in actual application scenarios.(2)Exploring the heuristic role of Chinese character construction attributes in OCR conversion text error correction.This dissertation designs and constructs a knowledge map of Chinese characters on the properties of the structure of Chinese characters,and uses the knowledge map of Chinese characters to assist the knowledge inference model to predict the error rules between Chinese characters and OCR during OCR conversion,which effectively improves the model's reasoning ability.(3)A knowledge reasoning model based on differential coding is designed.Through a variety of deep learning models,different data type entities in the knowledge map are targeted to make the feature matrix richer in semantics.The different data feature matrices that have undergone differential encoding processing are stitched together.The ConvE graph convolution model is used to Feature matrix for convolution learning.In this dissertation,experimental evaluation is performed on the collected OCR conversion error rule data set and Movielens public data set.The experiments show that,for the current mainstream knowledge reasoning model,the knowledge reasoning model proposed in this dissertation includes MRR,Hist @ 1,Hist @ 2 and other indicators.An important improvement has been achieved.(4)An algorithm for feature enhancement of the triplet feature matrix is introduced.By cross-referencing the relational matrix with the head and tail entity matrix,a more representative head and tail entity matrix is obtained as the input of the convolution prediction network.By integrating the algorithm into the error rule inference model,the experimental index of the model on the data set is effectively improved.In order to verify the effectiveness of the text correction method for OCR conversion,this dissertation designs and implements an OCR text error correction system to verify it under actual use scenarios.
Keywords/Search Tags:OCR, Text correction, Deep learning, Knowledge Inference, TF-IDF
PDF Full Text Request
Related items