Font Size: a A A

Optimization Based On Binary Algorithms Of Document Image

Posted on:2016-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:S L ZhangFull Text:PDF
GTID:2308330464471556Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As time goes on,more and more paper documents need more spaces to store. What is more, it is not convenient to use and search these documents. Therefore, digital storage needs of these paper documents will make it easier for management and application. To digitalize these documents, the process of shooting them, printing them as document images, making the images into binarization, and then keeping their text information before subsequent processing is very common. As a key step, the accuracy of binary directly affect the subsequent processing steps and accuracy rate, so it is vital to make the document image binarization accurately.Many of paper documents, however, are unnecessarily distressed with the passage of time, such as aging paper, some used-traces, the penetration of handwriting on the reverse of the storage etc. Therefore, the difficulty of making these document image binarization will increase dramatically. So, in the recent ten years, binarization algorithm has a constant progress, only to improve the accuracy of binarization, as well as the adaptability of the images in the historical literature document image. But due to the huge population of document image, there are different types of document images appear constantly, which greatly increasing difficulty of the binarization algorithm to adapt to document image. There is different binarization algorithm accuracy for different document image binarization, and it is difficult to have a binarization algorithm to adapt to the existing or future all types of document images, hence, it is not the ideal solution to find out a binarization algorithm.Therefore, based on document image binarization algorithm, we put forward a kind of optimization method, which is used to optimize the accuracy of the results and the adaptability of the existing binarization algorithm. The merits of existing binarization algorithm will be kept, the adaption of various kinds of document image binarization accuracy will be enhanced and meanwhile, the accuracy for types of document image will be improved. First of all, we use the K-means algorithm to get text classification information of the image. Then, by using binarization algorithm, document image connects components of the binarization result, and each individual connected components are marked. Finally, classifying pixels in each connected components, removing the binarization of mistaken background pixels, achieving the goal of optimization of binarization algorithm accuracy.For document image processing, more accurate binarization results can greatly ensure the accuracy of character recognition operations, improve the working efficiency of the follow-up work, which is significant for practical application.
Keywords/Search Tags:document image, binarization, optimal, k-means, connected component
PDF Full Text Request
Related items