Font Size: a A A

Research And Implementation Of Network Chinese PDF File Confidentiality Review System

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:M T DuanFull Text:PDF
GTID:2348330542475451Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In developing Internet,using documents,especially PDF,to transfer message has became more and more universal.The traditional confidentiality review is most focused on Internet protocol reassemble and developed lots of research and products in this area.However,the traditional research has ignored confidentiality review of attachment including PDF and office documents which transferred by application layer protocol,such as SMTP,FTP and SMTP.To make up the blank of network confidentiality review,we have design a system which can process confidentiality review to PDF documents with Chinese content.As confidentiality review system to PDF documents with Chinese content depend on content extracting in network,and the system has a high demand on the time and space efficiency.Then,we have design a confidentiality review system and research the bottleneck of this system and introduce some method to solve those problems including location of content stream and Chinese CID code transferring in this dissertation.In PDF document,the text content is stored in content stream,and the content stream is follow behind its label.The problem to location the label of content stream practically is a special extract single string pattern matching problem.Different to the usual single string matching problem,the label location has something special including the particular data model and pattern characteristics.In this dissertation,we introduce a light weight and quick string matching algorithm to take advantage of these characteristics,prove the correctness of this algorithm and verify the advantage of this algorithm over classic extract single string pattern matching algorithms in the special circumstance of PDF document label location.To transfer CID to Unicode efficiently,we introduce two methods,including direct mapping and range RB tree mapping and compare the advantage and disadvantage of these two methods in experimental environment.Then,we have done the experiment of these two transferring methods in network environment,and verify that method based RB tree mapping use less memory in the real network environment.
Keywords/Search Tags:PDF, text extract, confidentiality review, code transfer
PDF Full Text Request
Related items