Font Size: a A A

Research On Degraded Document Image Binarization

Posted on:2017-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhaoFull Text:PDF
GTID:2348330533950386Subject:Navigation, Guidance and Control
Abstract/Summary:PDF Full Text Request
Document image binarization is a key section of document automatic processing system.The performance will affect the next steps(such as text extraction and recognition)directly.The binarization of degraded document images is still a focus and unsolved research for some degradation factors,including complex background,weak strokes and so on.With a depth research on degraded document images binarization,this paper presents two algorithms,one is based on local contrast and stroke width estimated image binarization algorithm,the other algorithm uses support vector machine(SVM)classification.The algorithm based on local contrast and stroke width estimated image binarization enhanced the local contrast of document image firstly.Then,we use the Otsu algorithm to global optimum threshold processing.The next step is to use the outline method to estimated stroke width which determines the size of the neighborhood window.In this way,we are able to divided character foreground from the background.Due to the use of Otsu algorithm for threshold segmentation,the performance of binarization will also affect the final result.In order to improve the effect of the binarization,an in-depth study was going on.Finally,we proposed a new algorithm on the basis of local contrast and stroke width estimated image binarization algorithm.In the first place,the whole image was divided into 5×5 image blocks,which means that each image is ultimately divided into 25 blocks.Then,each block image will be disposed separately.Moreover,the SVM is used to classify each document image.By extracting eleven kinds of features,document images can be classified into three categories.Three different global thresholding methods are used to achieve the corresponding categories of initial binarization process.Next,the complete binary image assembled by previous image blocks will take local binarization processing by using stroke width estimation.Among them,the size of the sliding window is decided by the value of stroke width.In this way,the noise and false positives pixels will be eliminated clearly.In this paper,we compare ten kinds of classical algorithms with our two proposed methods.In order to estimate the performance of binarization,six commonest performance indicators,including F-measure,PSNR,SSIM,NRM,DRD and MPM were used.Experimental results show that the previous algorithm we proposed not only retains the details of strokes preferably,but also suppresses the background of the document.In terms of binary image quality,the algorithm based on SVM classifier is further improved.Moreover,it has obvious advantages on evaluation parameters.
Keywords/Search Tags:degraded document image binary, local contrast, stroke width estimation, SVM classifier
PDF Full Text Request
Related items