Font Size: a A A

Research On Traditional Chinese Image Textual System Based On Deep Learning

Posted on:2021-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WeiFull Text:PDF
GTID:2428330605482464Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The storage in electronic form is quite effective to better disseminate and protect ancient books and documents.The digital textualization is mainly composed of character positioning and character recognition of ancient book images.The deep learning has become a hot research direction in recent years with significant effects in the fields of image recognition,target detection and natural language processing.The diversified fonts of Chinese characters with mixed print and handwriting as well as difference in printed fonts with various interference noise points in ancient Chinese books pose higher requirements on character positioning and recognition.In order to reduce labor cost of inputting traditional Chinese images into electronic form,this thesis conducted design researches by taking the deep learning technology as the main recognition method,and the manual correction as the core algorithms.Further,the Web visualization pages were applied to lower the threshold for users so as to achieve a traditional Chinese character system with complete algorithms and feasible results.At present,the deep learning has been widely applied in the field of simplified Chinese image text,but which is relatively deficient in traditional Chinese images,especially in the field of Chinese ancient books.Therefore,the research on the textualization of traditional Chinese images is of great significance to the application of deep learning as well as studies on character positioning and characters recognition of Chinese images.This thesis conducted researches and experiments on the textualization of ancient books.The main contents and innovations were as follows:1.The data set labeling algorithm was designed in the absence of a ready-made data set;that is,the character position was initially determined via the MSER algorithm,and then the final character position information data set was obtained through manual correction.Further,the character positioning algorithm for single ancient book images was designed by combining with One-Stage target detection algorithm of the deep learning.The VGG16 was taken as the main convolutional network to detect the character position by using the Anchors+ Bounding Boxes method based on feature maps at different layers.More importantly,this thesis not only compared the performance differences between the traditional image processing and the deep learning in traditional Chinese character positioning,but showed the performance differences of the deep learning aiming at different fonts and different text sizes.In conclusion,this thesis analyzed reasons for performance differences of different algorithms.2.Based on the convolutional neural network,this thesis designed and constructed a deep learning model of character recognition for traditional Chinese ancient book images by combining the Inception module and the residual neural network module.The regularization technologies such as L1,L2 regularization,data augmentation,and dropout etc were adopted to further enhance the generalized recognition capabilities of the deep learning character recognition models aiming at different texts.In addition,this thesis compared differences on the recognition effect of printed character and photocopy character of real ancient books between various mainstream deep learning models and the deep learning model,analyzed the performance differences of the designed character recognition model aiming at different structural variants,and showed the effects of different regularization methods on model performances.3.The end-to-end textualization algorithm from image to digital text was designed and implemented by combining the character positioning algorithm and character recognition algorithm in this thesis.The traditional Chinese image textualized Web system was realized by taking this textualization algorithm as the core and the SSM(Spring+SpringMVC+MyBatis)as the framework.The textualization system not only contained the core function of textualizing ancient book images,but designed the user login-in,user data storage,positioning result correction,recognition result correction and recognition result download.
Keywords/Search Tags:Deep Learning, Convolutional Neural Network, Character Recognition, Text Positioning, Image Processing
PDF Full Text Request
Related items