Research On Error Correction Technology Of Text Recognition Based On Hidden Markov Model

Posted on:2021-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:C Peng

Full Text:PDF

GTID:2518306569995009

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet age,the digitization of all kinds of information can be seen everywhere in people's lives.For example,students used paper books as the medium of information in class before,but in recent years electronic teaching tools such as courseware and PPT have become more and more common.Electronic books are also becoming more popular among students for the lower cost.But at the same time,it also brings a series of problems.For example,the text in PDFs or pictures in electronic books can't be directly edited like documentation,which causes many inconveniences for information search,modification,or statistical entry.In response to these problems,this dissertation proposes a text recognition and error correction system based on hidden markov modelLayout analysis model based on projection contour fusion.The recognition object of this dissertation is the text area in the document image.It needs to segment the non-text area such as picture,formula and table through layout analysis.In this dissertation,projection analysis and contour detection are used to locate the non-text area.The projection method uses the difference between the projection features of the non-text area and the character,while the contour method uses the difference between the contour features and characters of the non-text area.Finally,the best text area is obtained by the fusion analysis of the their results.The experimental results show that the fusion method can better realize the segmentation of non-text regions,and pave the way for the recognition and error correction of text line images.Text recognition model based on deep learning.Traditional text recognition methods need to locate the area of each character.The recognition effect of this method depends on the segmentation effect of characters.But character segmentation is a very difficult problem for the characteristics of Chinese characters.In this dissertation,we adopt Dense Net + Bi LSTM + CTC deep learning framework as the text recognition model.In the process of recognition,characters are automatically segmented and recognized according to the characteristics of the text image,which effectively avoids the manual character segmentation.Experimental results show that the deep learning model used in this dissertation effectively completes the task of text recognition.Text error correction model based on hidden markov.The results of text recognition can't guarantee 100% accuracy,so it is necessary to improve the recognition rate through text error correction.The recognition result of OCR is taken as the observation state,and the corresponding correct character is taken as the hidden state.In this way,the problem of text error correction is transformed into the problem of finding the most likely correct character sequence under the given recognition result sequence.The recognition rate is improved to 99.02%.To sum up,this dissertation extracts the characters of document images through layout analysis and text recognition,and then corrects the errors in the characters based on hidden markov model,and completes the information extraction of document images.

Keywords/Search Tags:

hidden markov model, text recognition, text error correction, deep learning, layout analysis

PDF Full Text Request

Related items

1	Algorithm Research For Text Information Extraction Based On Hidden Markov Model
2	Research And Application Of Text Error Detection And Correction After Speech Recognition
3	Research On Chinese Text Error Correction Method Based On Deep Learning
4	Short Text Enhanced Learning Analysis Method Of Online Learning Community
5	A Study On Deep Learning Based Chinese Text Detection And Recognition
6	Research And Implementation Of Intelligent Recognition Of Medical Laboratory Report Image Based On Deep Learning
7	Web Free Text Information Extraction Based On TABLE Layout And Hidden Markov Model
8	Research On Text Detection And Recognition Technology Based On Deep Learning Methods
9	Research Of Web Text Mining Technology Based On Hidden Markov Model
10	Reasearch And Application Of OCR Conversion Text Error Correction Method Based On Knowledge Graph