Font Size: a A A

Research On Document Image Retrieval Technology Based On Combined Feature

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:S FengFull Text:PDF
GTID:2428330620460024Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of the intelligent terminal and computer technology,document images have been gradually adopted as a substitute for paper documents by government sectors,companies,schools and other institutions.Immediately locating and retrieving document images is important to popularize electronic offices.At present,in the field of document image retrieval,there are two major schools which are based on content information and features,respectively.However,research suggests that there are still some schemes still have some limitations in previous work,such as accuracy,anti-interference and security.Therefore,this dissertation studies several important technologies involved in the filed of document image retrieval,by which we propose a document image retrieval method with better performance and further develop a prototype system accordingly.Firstly,according to the layout characteristics of document images,we take advantage of the document layout analysis to the location text lines.In this process,we use an optimized X-Y projection recursive algorithm to obtain text candidate regions,and propose an LA-based CNN algorithm to classify images and texts.Then,a bottom-up text line location technique is presented.Experimental results show that the proposed text line location technology performs well in practical simulations.Secondly,as the accuracy is relatively low when only shallow features are used in the searching for pictures,we propose a document image retrieval technology based on combined features.By analyzing the deep features of the CNN and the shallow features of the ORB of the document image,we present the combined features that can be utilized to further improve the retrieval accuracy.On this basis,considering that the feature dimension of different length text lines is different and there may be feature offset,a feature matching algorithm based on the Gauss distribution is proposed,Therefore,the entire document image retrieval technology based on combined features is formed.The experimental results show that the text line location technology based on layout analysis proposed in this paper performs well in practical experiments.Finally,based on the research of key technologies of document image retrieval,this paper designs and realizes a retrieval prototype system from the perspective of the practical application.In this prototype system,we use the text line as the basic retrieval unit of the document image,and utilize combined features and the corresponding matching algorithm to achieve document image retrieval.The experimental results show that the proposed system yields better anti-noise performances than other typical retrieval systems.Furthermore,it solves the problem that global features are not applicable to local image matching to some extent.
Keywords/Search Tags:Document Image Retrieval System, Convolutional Neural Network, Layout Analysis, Combined Feature
PDF Full Text Request
Related items