Font Size: a A A

Historical Document Images Intelligent Annotation System Design And Implementation

Posted on:2020-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:W G HuangFull Text:PDF
GTID:2428330590460929Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The historical document is an important medium of Chinese culture,which has a long and profound history.Digitalization is an effective approach to inheriting Chinese traditional culture and at the same time,promoting the reservation and recycling of historical documents.Besides,considering the huge amount of data,it is impossible to process it merely depending on manual input.In the era of big data,a large quantities of data resources and computing resources promote the development of deep learning,which has made significant progress in computer vision technology.However,compared to the traditional method of machine learning,deep learning needs a large number of data.Since the lack of dataset of Chinese historical documents limits the research on deep-learning algorithm,it is in urgent need to build systems suitable for large-scale data annotation.Given extreme efforts are needed to annotate data manually,it can be pre-processed with the method of machine learning in order to reduce the workload of manual annotation.Based on the analysis above,the summary of the thesis is in the following.1)We have employed vertical projection to achieve column segmentation and character segmentation of original data.Based on the single-character dataset,when using page-level text annotation in the training process,the prototype learning-based convolutional neural network has higher accuracy and better generalization than the Softmax model.2)We have analyzed the demand of annotating historical documents and designed an annotation system based on Alibaba cloud,which has provided an effective tool for annotating and managing large-scale data.It can also provide a highly practical prototype system for digitizing the historical documents.3)We have built and published a dataset of Chinese historical documents,including 2,000 pieces of Tripitaka Koreana in Han with character-level location information and text annotation.Based on the dataset,an approach to detecting and recognizing the texts of historical documents has been studied.Moreover,the algorithm service has been developed.
Keywords/Search Tags:Historical documents digitization, Deep learning, Annotation system, Cloud service
PDF Full Text Request
Related items