Font Size: a A A

Research And Implementation Of PDF Conversion System To Generate HTML Resources For Multiple Terminals

Posted on:2015-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q LinFull Text:PDF
GTID:2298330452953369Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the mobile Internet era, people’s reading habits turn from thelocal to the cloud and the way of reading document turn from local browsing to onlinereading.On the existing reading and sharing online platform, the unified solution todocumentation online browsing is to transform various non-PDF format documentsinto PDF documents to realize the online reading of PDF document.The major formatof PDF document online browsing is SWF, which indicates that the usual way ofrealization is to transform the PDF document into SWF format in the first step andthen play the updated SWF format document online using FLASH player.Howeverthe FLASH-based reader can’t satisfy the users’ reading preference and provide goodreading experience due to its ideal sharpness, high loading failure rate and some otherless powerful functions. Besides, for the handheld devices with different screen sizes,e.g., phone or PAD, FLASH-based reader lacks of the capacity of presentation andadaptive.For the devices without FLASH player, browser can only show the PDF text.In the worst case, text would even be garbled.Major solution to solve the clarity problem is to convert PDF documents directlyinto HTML documents.The most used solution to support multi-terminal is to changethe display order of PDF elements, i.e., from the order of coordinate to the order offlow.As a result, it becomes a new trend to convert pdf into html and re-oder pdfelemets into flow.This topic discusses the PDF conversion system which can generatemulti-terminal supported HTML resources. The contributions of the work are listed asfollows:Provide a data format definition for Web rendering. The definition of the formatincludes not only the essential elements (including text, images, and fonts) and theproperties for the PDF rendering, but also the absolute order of the elements renderedsuch that the render of PDF can be achieved through either layout or flow.Provide atext region detection algorithm. With text region detection algorithm, the accuracy ofthe text rearrangements can be improved, and the rearrangement problems such ascolumns documentation can be well solved.Provide a merging algorithm for vectorelement. With vector merge algorithm, the fragmentation of graphics would bereduced such that vectors can be displayed more complete at different terminals.Design a PDF font processing system for Web browsing which can extractfonts from PDF document, reconstruct font file, convert font into other formats,provide font file for multi-terminal and so on.Based on above, this work provides a PDF convertion system which is developedbased the open source tool Xpdf.Provide a web deployment scheme for the conversionsystem. By proper deploying the conversion system, font processing unit and datacenters, an online reading platform can be easily constructed. As a result, the onlinereading services can be directly provided to users.
Keywords/Search Tags:PDF2HTML, Flow Layout, Text Rearrangement, Font Processing
PDF Full Text Request
Related items