Font Size: a A A

Research On Error Correction Technology Of Text Recognition Based On Hidden Markov Model

Posted on:2021-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:C PengFull Text:PDF
GTID:2518306569995009Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Internet age,the digitization of all kinds of information can be seen everywhere in people's lives.For example,students used paper books as the medium of information in class before,but in recent years electronic teaching tools such as courseware and PPT have become more and more common.Electronic books are also becoming more popular among students for the lower cost.But at the same time,it also brings a series of problems.For example,the text in PDFs or pictures in electronic books can't be directly edited like documentation,which causes many inconveniences for information search,modification,or statistical entry.In response to these problems,this dissertation proposes a text recognition and error correction system based on hidden markov modelLayout analysis model based on projection contour fusion.The recognition object of this dissertation is the text area in the document image.It needs to segment the non-text area such as picture,formula and table through layout analysis.In this dissertation,projection analysis and contour detection are used to locate the non-text area.The projection method uses the difference between the projection features of the non-text area and the character,while the contour method uses the difference between the contour features and characters of the non-text area.Finally,the best text area is obtained by the fusion analysis of the their results.The experimental results show that the fusion method can better realize the segmentation of non-text regions,and pave the way for the recognition and error correction of text line images.Text recognition model based on deep learning.Traditional text recognition methods need to locate the area of each character.The recognition effect of this method depends on the segmentation effect of characters.But character segmentation is a very difficult problem for the characteristics of Chinese characters.In this dissertation,we adopt Dense Net + Bi LSTM + CTC deep learning framework as the text recognition model.In the process of recognition,characters are automatically segmented and recognized according to the characteristics of the text image,which effectively avoids the manual character segmentation.Experimental results show that the deep learning model used in this dissertation effectively completes the task of text recognition.Text error correction model based on hidden markov.The results of text recognition can't guarantee 100% accuracy,so it is necessary to improve the recognition rate through text error correction.The recognition result of OCR is taken as the observation state,and the corresponding correct character is taken as the hidden state.In this way,the problem of text error correction is transformed into the problem of finding the most likely correct character sequence under the given recognition result sequence.The recognition rate is improved to 99.02%.To sum up,this dissertation extracts the characters of document images through layout analysis and text recognition,and then corrects the errors in the characters based on hidden markov model,and completes the information extraction of document images.
Keywords/Search Tags:hidden markov model, text recognition, text error correction, deep learning, layout analysis
PDF Full Text Request
Related items