Font Size: a A A

Researches On Bleed-through Removal Algorithm For Scanned Historical Document Images

Posted on:2017-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuFull Text:PDF
GTID:2348330488475908Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The historical documents refer to the book that is not printed by the modern printing technology. They are rich of the historical records and indispensable for the study of ancient society. The digitization of the historical documents has great advantages in the information dissemination and utilization. The digital images of the historical documents should also provide the users with the reliable quality image information of historical documents, while providing the characters information. However, due to the long time or the improper preservation, the historical documents have a serious degradation in quality, so the restoration of the historical documents images has become an important issue in the digitization. The problem of the bleed-through is particularly important for the scanned images of the historical documents. A phenomenon of bleed-through refers to one kind of the image quality degradation, which is the ink seeped through the paper and reveals on the other side of the same page. This may make the historical documents hard to read manually or automatically. Therefore, how to remove the bleed-through in scanned image of historical documents effectively and efficiently has become an increasingly important issue in the digital process of historical documents. This thesis will focus on a study of bleed-through removal algorithm for the scanned historical document images.In this thesis, we introduce the research status of bleed-through removal. According to the investigation of the exising methods for bleed-through removal, a SVM-based method is proposed for non-blind bleed-through removal. And, a global and local features based algorithm is proposed for blind bleed-through removal. The main contents of this paper are as follows:1. For information extraction of the scanned images, we analyze the commonly used image feature extraction algorithm, and propose a GMM based global feature extraction method. According to the image contents characteristics of the scanned historical documents, we fit the GMM to the intensity distribution of the scanned image and use the parameters of the GMM as the global features.2. In the aspect of the non-blind bleed-through removal, a non-blind bleed-through removal algorithm based on support vector machine is proposed. K-means clustering is used for the registered scanned image pair in the algorithm. According to the features of image pairs, training samples are selected randomly for SVM classifier. Finally, according to the classification results, the images without the bleed-through are obtained by image inpainting. The algorithm is simple in operation, which can satisfy the requirement of non-blind removal.3. In the aspect of the blind bleed-through removal, we firstly analyze the effect of global and local features on the bleed-through removal. A global and local features based blind bleed-through method is proposed for scanned images. The global features of the image are extracted by the parameters of GMM. Local features are extracted by the patch around each pixel. Then, the ELM classifier is utilized to classify the scanned images by using these features. Finally, we remove the bleed-through and inpaint the images. Experiments on the scanned historical document dataset show that the proposed method can effectively remove the parts of the bleed-through in different test images.
Keywords/Search Tags:Scanned image, Bleed-through, Feature extraction, Gaussian Mixture Model, Support vector machine, Extreme learning machine
PDF Full Text Request
Related items