Historical Document Images Intelligent Annotation System Design And Implementation

Posted on:2020-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:W G Huang

Full Text:PDF

GTID:2428330590460929

Subject:Electronic and communication engineering

Abstract/Summary:

The historical document is an important medium of Chinese culture,which has a long and profound history.Digitalization is an effective approach to inheriting Chinese traditional culture and at the same time,promoting the reservation and recycling of historical documents.Besides,considering the huge amount of data,it is impossible to process it merely depending on manual input.In the era of big data,a large quantities of data resources and computing resources promote the development of deep learning,which has made significant progress in computer vision technology.However,compared to the traditional method of machine learning,deep learning needs a large number of data.Since the lack of dataset of Chinese historical documents limits the research on deep-learning algorithm,it is in urgent need to build systems suitable for large-scale data annotation.Given extreme efforts are needed to annotate data manually,it can be pre-processed with the method of machine learning in order to reduce the workload of manual annotation.Based on the analysis above,the summary of the thesis is in the following.1)We have employed vertical projection to achieve column segmentation and character segmentation of original data.Based on the single-character dataset,when using page-level text annotation in the training process,the prototype learning-based convolutional neural network has higher accuracy and better generalization than the Softmax model.2)We have analyzed the demand of annotating historical documents and designed an annotation system based on Alibaba cloud,which has provided an effective tool for annotating and managing large-scale data.It can also provide a highly practical prototype system for digitizing the historical documents.3)We have built and published a dataset of Chinese historical documents,including 2,000 pieces of Tripitaka Koreana in Han with character-level location information and text annotation.Based on the dataset,an approach to detecting and recognizing the texts of historical documents has been studied.Moreover,the algorithm service has been developed.

Keywords/Search Tags:

Historical documents digitization, Deep learning, Annotation system, Cloud service

Related items

1	The Research On Digital System Of Historical Documents Based On Crowd-sourcing Model
2	The Recognition System Of Uchen Tibetan Historical Documents Based On Deep Learning
3	Research On Deep Learning For Historical Mongolian Document Images Retrieval
4	Handwritten Text Detection In Natural Scenes And Historical Documents
5	Handwriting Word Retrieval Algorithms And Applications To Historical Documents Using Deep Learning Method
6	The Recognition And Detection Of Chinese Characters In Historical Document Based On Deep Learning
7	The Design And Implementation Of Medical Image Annotation System For Deep Learning
8	Research On Text Line Segmentation Method Of Tibetan Historical Documents Based On Rules And Learning
9	Research On Deep Learning Based Script Identification Method Of Korean History Documents
10	Study On Improved Cloud Service Selection Based On QoS Historical Data