Font Size: a A A

Text Line Segmentation And Correction On Scanned Image Based On Graph Theory

Posted on:2018-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhongFull Text:PDF
GTID:2348330542456773Subject:Control engineering
Abstract/Summary:PDF Full Text Request
In recent years,the establishment of the digital library will use optical character recognition(OCR)to make the document material digitization.Text segmentation and correction as an important part of optical character recognition system,it has important research significance and value.The distorted text line segmentation error leads to decreased character recognition rate.Therefore how to achieve accurate text line segmentation effectively is one of the most important problems in the OCR system.This thesis introduces the research status at home and abroad about text line segmentation and correction technology.On the basis of existing text segmentation and correction technology,this thesis presents a new text line segmentation and correction method for distorted and tilted scanned document images.The main content of this thesis is as follows:1.In the aspect of text line single character position detection,the existing smoothing denoising,binary morphology,connected domain analysis algorithm is analyzed and compared.The mean filter is be choosed to smooth image denoising.Linear structure elements is be used to make a single character connected into a connected domain.Some unit connected domains that does not meet a certain area and the height is be deleted by setting threshold filtering.2.For text line segmentation,this thesis proposes a new method for warped and tilted scanned document images.The new method makes unit connected domain of characters as characters node.By establishing the graph model for the scanned document images,the text line segmentation problem is converted into searching for the shortest path problem.The shortest path algorithm is utilized to extract nodes of characters that belong to the same line segmentation text line.The proposed method can obtain accurate text line segmentation for the warped document images efficiently.3.A text line correction method is proposed on the connected domain level.According to the position of the first connected domain,all the connected components belonging to the same line at the back of the first characters is rearranged to make warped textline straight.4.The performance of the proposed text line segmentation and correction method is evaluated both subjectively and objectively.The correct detection rate and error detection error are used as objective metrics.A software is developed including the user interface and the processing algorithms.It can produce accurate text line segmentation results and correct the warped text lines effectively.Segmentation and correction algorithm has been texted on multiple image,and the experimental results show that the effectiveness of the method.
Keywords/Search Tags:Graph theory model, The shortest path, Dijkstra's Algorithm, The text line segmentation, The text line correction, Connected domain unit
PDF Full Text Request
Related items