Font Size: a A A

Research On Text-Independent Source Printer Authentication For Printed Documents

Posted on:2017-12-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:T H FangFull Text:PDF
GTID:1368330512454938Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the widespread use of laser printers, print files occupy an increasingly important position in people's daily life and work, and civil disputes and criminal cases which related to print documents become more and more. Increasing demand of information security calls for new printed document examination technology to identify the source printers of the questioned documents, which is also named Source Printer Authentication (SPA). Existing printed document examination technology has made great progress in some respects, but still there are many critical issues are not effectively addressed.Under this background, current SPA methods are investigated and concluded, the difficulties of SPA are analyzed and practicable scheme based on texture features are provided in this thesis, and some solutions are proposed for the problems in SPA. The main contents and innovations of this thesis are listed as follows.First, upgrading existing image acquisition apparatus is firstly researched. Existing apparatus which named high magnification microscopic image acquisition system is inefficient and quite time consuming, especially on acquiring the image of printed document if the paper size is A4. As much as possible to reduce the work intensity and avoid human errors, the image acquisition device is upgraded, and large field microscopic image acquisition system is designed.Secondly, the SPA method based on feature fusion and selection is proposed. Relationship between the rotational speed and the line spacing of laser printer scanning is analysis on the theoretical model, which provides a theoretical basis for the process of extracting GLCM statistical texture features only take the horizontal and vertical directions. At the same time, in order to consider the spatial domain and frequency domain properties of character images, the GLCM and DWT statistical texture features are combined, and the combined statistical texture features are selected twice. First ReliefF algorithm is used to select the combined statistical texture features, the SVM-RFE feature selection algorithm which based on data learning is choose for the second feature selection. The experimental results on the three different sample sets show that the GLCM and DWT statistical texture features are effective, and feature selection is conducive to improve the classification performance of SPA.Thirdly, current SPA methods are invalid when there are no identical characters between the training and testing documents. In order to address this problem, an improved Local Binary Pattern (LBP) texture descriptor based on Gaussian pyramid decomposition of images is proposed. LBP is a kind of image statistic histogram feature, which is little affected by the character structure, and the improved LBP descriptor effectively extracts the character image texture differences of local structure. Multi-scale technology such as the Gaussian pyramid structure can obtain image different resolutions and different thickness scales of information, and more effective to extract the image features. The experimental results illustrate that the multi-scale LBP technique is effective to solve the problem when there are no identical characters between the training and testing documents.Fourthly, the SPA method based on sparse representation is proposed. In order to further improve the classification accuracy of SPA, a dictionary learning algorithm based on GPDCLBP feature and FDDL is proposed, which through FDDL method, the characteristic feature datas for learning first, and then with the sparse representation classification. The experimental results on the same Chinese character sets and English character sets show that the algorithm can get a better classification performance with few data information of samples and time saving relatively.Finally, the SPA method based on data mining is proposed.Factor variance analysis model about printer texture element is built to examine the significance of printer texture elements, and the experimental results prove that texture information can be the foundation of SPA. Then two-way factor analysis model of texture factor and character factor is built to analyze and examine the influence of these two factors for printed document images. Two-factor analysis of variance model is used for excavating the character factor of image feature, and at the same time eliminating the character factor, the identification performance is improved by information fusion, when two print file does not contain the same characters and the small overall number of characters, but also get a good the discrimination performance.The experimental results show that the algorithm can get a better classification performance when there are no identical characters between the training and testing documents and few number of charactersThe SPA problem is further investigated and the texture information of printed documents are researched and discussed in this thesis, effective SPA scheme is proposed for the key technology of the SPA. The experimental results are promising and improved substantially compared to previous methods. The SPA technique is more applicable than ever.
Keywords/Search Tags:Printed document authentication, texture features, feature selection multi-scale LBP, dictionary learning, factor analysis
PDF Full Text Request
Related items