Font Size: a A A

Low-quality Document Image Binarization Study

Posted on:2013-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:L N HuFull Text:PDF
GTID:2218330371459719Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Binarization is a key pre-processing of document automatic processing system. It affects the overall performance of the system directly. Degraded document image is caused by complex background, weak strokes and many other factors. Its binarization is still a focus and unsolved research. This paper analyzes the main reason for the decline on quality of document, and focuses on how to binary a document image which has a weak stroke, ink infiltration phenomenon as well as uneven background.Firstly, we study the document binarization algorithm based on the local maximum and minimum which was proposed by Su. Then a new improved algorithm which is based on gradient standardization is proposed. The method first detects the edge points of character strokes according to the gradient standard. Then obtain the edge region of strokes by extreme filter. Finally, binary the document image according the local threshold which is calculated by the strokes' edge region. In this paper, we do the experiment using Otsu algorithm, Niblack algorithm, Su algorithm and our method on the document images provided by the paper. The results show that the proposed gradient standardization method not only can detect the target character information effectively, but also produce less noise.As is known to us all, the visual attention has been widely used in the target detection field, natural image compression field, image searching field, visual interface designing field and so on. However, there are few reports about the application of document processi-ng system. This paper analyzes the banarization of the document image from the perspecti-ve of visual attention, and proposes two methods which are both based on saliency map. Global threshold method is to use the threshold to do the binarization for the charater region. As the character size and character region related to the distribution, the effect of this method is not very well. The result shows that this method is better than the Otsu method and Niblack method, but worse than the Su method. Local threshold method is to use local threshold to do the binarization for character regions. The result shows that this method is better than the Otsu method, Niblack method and Su method.
Keywords/Search Tags:degraded document, binarization, gradient standardization, visual attention, saliency map
PDF Full Text Request
Related items