Font Size: a A A

The Research On Retrieval Technology Of Document Image

Posted on:2018-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:P ChenFull Text:PDF
GTID:2348330518458389Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As the development of computer storage technology is getting faster and faster,the rapid propagation of computer vision research,today,the main carrier of information will undergo a rapid transition stage,from paper documents into electronic documents.For some very important official documents,its main storage for the text image.Compared with the text file,the image file can be more intuitive to the scene,the real expression,it is not easy to tamper with or forged.Such as image files that better reflect their authenticity and validity,such as image files containing handwritten signatures and bank notes.Today's society,whether in the commercial economy or government departments and other fields,for a lot of important information is the way to electronic video exchange.The text image retrieval technique is designed to automatically and quickly retrieve image files containing target print characters,handwritten content,or other image information from massive text images.The research of this technology has important application significance to the intelligent management of text image file and the quick retrieval of important documents.Through a large number of literature reading,analysis of domestic and foreign retrieval technology,especially based on text image retrieval technology research status,the current text image retrieval technology,especially handwritten content retrieval there are many problems and challenges.In this paper,the text image retrieval,the retrieval object is a text image,the target is printed characters,phrases,sentences or handwritten Chinese characters,belonging to the category of content image retrieval.In this paper,from the text image preprocessing and identification and retrieval of two aspects,in-depth study of the relevant processing algorithms and technical solutions,the main work and innovation as follows:(1)This paper studies the preprocessing problem of text image denoising,layout tilt correction,removal of layout grid and text image segmentation.In order to solve the problem of text image segmentation,the original row and column scanning projection algorithm is improved,and the information such as aspect ratio and duty cycle is used to identify and eliminate irrelevant information other than characters,which improves the accuracy of segmentation of printed characters.(2)the typical printed character recognition algorithm has done in-depth theoretical research and experimental analysis.The characters of large characters are regarded as pattern information,and no character segmentation steps are needed.The local texture features are analyzed and retrieved to improve the retrieval efficiency and correct rate.In addition,this paper studies the printing Chinese character recognition problem of different fonts,redefines the key points of HOG character description,and puts forward the font invariant operator PHOG,and makes a small experiment test and analysis.(3)The research on the recognition algorithm of handwritten content is put forward,and a new idea is put forward-to bypass the traditional "cut and re-recognition" method,and to extract the texture feature directly to the text image.The specific implementation method makes the search and matching of the target directly by using the scale invariant,rotating invariant,affine invariant SIFT feature: first determine the four candidate sub-regions,and then analyze the sub-area details to determine whether the target information is included.At the same time,the SIFT matching point is improved to improve the description ability of Chinese characters by SIFT operator.A number of texture analysis methods are proposed to enhance the image texture information and improve the accuracy of sub-region detail analysis,thus improving the overall handwriting content The correct rate of text image retrieval.
Keywords/Search Tags:Document image retrieval, Printed Chinese characters, Handwritten Chinese characters, SIFT, Texture analysis
PDF Full Text Request
Related items