Font Size: a A A

Research And Implementation Of Chinese Document Digitization Based On Smart Phone

Posted on:2018-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2348330566954792Subject:Engineering
Abstract/Summary:PDF Full Text Request
Smart phones have been popularized,with more and more powerful shooting and processing capabilities,it could bring more convenience to people's work and life that building a document digitalization system based on smart phone.The main works in this paper are as follows:Analyzed and studied the open source project Tesseract-OCR from Google in-depth,and optimized it for the OCR of Simplified Chinese,so that improved the Simplified Chinese recognition speed more than 4 times while still increased the recognition rate.The optimization method is also applicable to Traditional Chinese,Japanese and Korean.Due to the unstable quality of the document images shot by mobile phone,this paper discussed the user guides when shooting the documents and the image processing processes such as tilt correction,denoising and binarization after shooting for inputting high quality images,and also designed a set of algorithms which is called the adaptive dual threshold method for image binarization under complex background and illumination conditions.On the basis of the above works,we designed and implemented a prototype of the document digitalization system based on the android mobile platform,which uses the enhanced image processing module and the optimized Tesseract-OCR to identify the text and layout,and finally generated the searchable PDF files by combining the OCR results and processed images.
Keywords/Search Tags:Tesseract, OCR, Adaptive dual threshold, Document digitalization
PDF Full Text Request
Related items