Font Size: a A A

Design And Implementation Of PDF To EPUB E-book Format Conversion Tool

Posted on:2017-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2348330542452407Subject:Engineering
Abstract/Summary:PDF Full Text Request
PDF means Portable Document Format,which is an open standard of electronic documents accepted by international world.The standard defines an e-book format,which is regardless of operating system platform,so the PDF become the main storage formats of digital information dissemination and electronic document distribution on the Internet.Especially in the printing and publishing industry,PDF has been used as the industry standard.EPUB means Electronic Publication,which is proposed by the Open Book Alliance International Electronic in 1999,and it is an open specification that used to indicate the content and the structure of the e-book.The specification is also an open standard which is used to unify the format and enhance the compatibility of e-books.Using the specification,the biggest feature of the e-book is that the content can be reformatted by the user setting.With the gradual expansion of our customer base,it is needed to convert the PDF format into the EPUB format for the user of the Press.So the company should increase the format conversion in technology.To convert the contents of PDF documents into EPUB documents,the content should include the text resources,image resources,style resource,directory data,metadata and other contents.To make the original fixed-layout PDF files with the rearranged characteristics of the EPUB files,it need to ensure different content files remain consistent layout after conversion,where include the process of single column plain text,multi column plain text,photo-text mixed,and so on.In this paper,a format conversion tool is designed and implemented from PDF format to EPUB eBook format,which solve the corresponding format conversion in texts,images,styles,metadata catalogs and other resources,also the different typesetting converted consistent processing in single column plain text,multi column plain text,photo-text mixed,and so on.Project development process related technologies employed are the PDF document parsing technology,the common image formats parsing technology,XML data storage technology,EPUB document parsing technology,file compression and decompression technology,font engine and 2D image rendering engine etc.According to the project requirements and the PDF to EPUB e-book format conversion tool realization,this thesis design three function modules,which include PDF resources extraction,typesetting rules and EPUB document generation.The object-oriented development method is used to achieve conversion PDF to EPUB eBook format.Specific implementation process is as follows.First of all,the PDF parsing engine is used to parse PDF documentation for metadata and catalog data,also extract text,images and style,and then extracted resource data files are stored in XML format.Secondly,the resource data described in the XMI,file typeset rules processing,including a chapter division,the divided line segment and the delineation process.Finally,the content data after the layout rule processing,and the extracted logical description of data and resource data,in accordance with the standard EPUB structure of the organization and compressed into ZIP format of the document,which generates a standard EPUB format document.The conversion tool is deployed in client and server,and use single column plain text,multi column plain text and graphic mixed PDF files as a test case,the final test results show that the conversion tools can basically meet the project demand and the realization of the PDF to EPUB e-book format conversion function.
Keywords/Search Tags:PDF Document, EPUB Document, Conversion Tools, Layout Rules, XML Data Storage Technology
PDF Full Text Request
Related items