Font Size: a A A

The Recognition And Detection Of Chinese Characters In Historical Document Based On Deep Learning

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:H L YangFull Text:PDF
GTID:2428330590984523Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Historical documents are invaluable treasures that human ancestors have created in past centuries.An important and efficient way to understand and protect these documents is through digitization,in which the texts and graphic symbols in real documents are systematically transformed into digital records.And the quality of this process strongly relies on the accuracy of character detection and recognition in the document.Unlike historical character recognition which has been extensively discussed in the literature as a branch of Optical Character Recognition(OCR)research,only a few studies have been conducted on historical character detection.However,it is believed that detection can significantly support research on historical documents in several different aspects.In recent years,deep learning methods have achieved substantial success in many applications of computer vision.We believe tasks that focus on recognizing and detecting characters and texts in historical documents can also achieve better performance by incorporating deep learning models into traditional algorithms.So,in this paper,based in a new dataset of historical documents,we conducted studies on the problems mentioned before and achieved great performance by applying deep neural network flexibly.Specifically,the content and innovation of this thesis consist of the following two major components:1)We introduce a historical Chinese text recognizer that is trained by data labelled in page-level without the alignment of each text line.We propose Adaptive Gradient Gate(AGG)to reduce the influence of misalignments between text line images and labels.With the help of the AGG,the error rate of the proposed text recognizer can be reduced by over 35%.2)We propose a novel method called recognition guided detector(RGD)that achieves tight Chinese character detection in historical documents.The proposed RGD consists of two simultaneously trained convolutional neural networks(CNNs): a recognition guided proposal network(RGPN)that provides context information of the text and a detection network that applies this information to localize each of the characters accurately.Experimental results show that,our proposed method achieves comparable or even better performance with much fewer parameters,when comparing to several state-of-the-art object detection and text detection methods.3)We propose a novel method to fine-tune bounding boxes tighter that roughly detected by deep learning-based detectors,which first adopt reinforcement learning on further precise location of Chinese characters in historical documents.In this paper,we implement and verify the feasibility of the proposed method.
Keywords/Search Tags:Deep Learning, Historical Documents, Detection and Recognition of Chinese Characters, Reinforcement Learning, Deep Q-learning Network
PDF Full Text Request
Related items