Font Size: a A A

Printed Documents Source Identification Using Geometric Distortion On Text Lines

Posted on:2017-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HaoFull Text:PDF
GTID:2348330488954738Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Digitization has penetrated into every corner of our lives. In the digital world, securing different forms of content is very important in terms of protecting copyright and verifying authenticity. In this thesis, we study the protection of printed documents which are usually used in our daily work and studies. The illegal and criminal activities associated with printed documents increase year by year so that justice departments and intelligence agencies urgently need the printed documents forensics. In recent years, the digital passive lossless forensics which only uses scanner and computer, has been developed and becomes an international focus on the field of multimedia information security. Device identification is a main issue in the digital passive lossless forensics. Printed Documents Source identification (printer identification) not relying on professional equipment and human labor increases the forensic efficiency, reduces the forensic cost and will not damage the original documents.This thesis focuses on solving the problems exist in the printed documents source identification field. We will propose new document detection technologies which will be robust to variation of toner density and noise. The proposed technologies do not need to restore the reference document manually, maintain high performance even when the document has partial content and be capable of identifying documents from the same brand, same model and different individual printers.Through the analysis of printed documents, we found the geometric distortion on text lines. In an electronic (ideal) document, the text lines are parallel. As printers have mechanical defects during the printing process, the printed documents have page geometric distortion. In experiments, we found that each text line has a tiny slope angle and all text lines are not parallel any more in a printed document. Page text line slopes change in specific pattern along the printing direction. This patter is distinctive since it varies in printer brands, models and individual printers. Thus page text lines geometric distortion is proposed. Printers can be characterized using geometric distortion on text lines which is introduced inevitably in the printing process.As for documents with the specifically local printed area, we propose Page Text Line Slope (PTLS) sequence and Page Text Line Interval (PTLI) sequence which are applied to describe the page geometric distortion in horizontal and vertical directions, respectively. As for documents with the randomly local printed area, we propose Virtual Page Text Line Interval (VPTLI) sequence. The similarity measure of two feature sequences with different length can be presented by the proposed Sequence Matching Distance. Four printed documents source identification algorithms are proposed to identify the printer source. We use 10 individual printers from 8 models of 3 brands to verify those proposed printer identification algorithms. We conduct experiments both for full page printed and local area printed documents for four algorithms. The average accuracy rate of the best algorithm is between 92.82% and 94.51%. The proposed algorithms overcome the shortcomings of the toner density sensitive and noise sensitive, do not need to restore the reference document manually, be capable to identifying the individual printer and maintain high performance even when the document has partial content.
Keywords/Search Tags:Printed Documents Source Identification, Page Text Lines Geometric Distortion, Page Text Line Slope (PTLS), Page Text Line Interval (PTLI), Virtual Page Text Line Interval (VPTLI)
PDF Full Text Request
Related items