Font Size: a A A

Research And Implementation Of Text Recognition For Confucius Ancient Book Document Images

Posted on:2022-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:J HouFull Text:PDF
GTID:2518306326971579Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The ancient archives contain the wisdom of the sages as well as the national sentiments.They are the very precious cultural asset of our country.At present,many ancient archives are facing the problems of weathering and damage,and cannot be accessed and shared by the public.The digitization of ancient archives is not only convenient for people to study and communicate,but also conducive to the inheritance and development of excellent traditional culture,which has important historical value and scientific research value.The digitization of ancient archives mainly involves the research of key technologies such as image preprocessing,document image segmentation,and character image recognition.Aiming at the problem of character recognition in image of ancient archives of Confucian Mansion,this thesis mainly studies the following contents.First of all,in view of the segmentation problem that the size of characters in the same column often differs greatly,a secondary-loop projection segmentation algorithm is proposed on the improvement of the single-loop projection segmentation algorithm.For the preprocessed binary image,this thesis firstly uses the single-loop projection segmentation method to extract each text column,and then divide each character according to the horizontal projection of the text column.Because there may be multiple columns of small characters in the same text column of ancient documents,two adjacent characters will be detected as a Chinese character,so we continue to perform the projection segmentation algorithm for each character,and set the appropriate threshold avoids excessive segmentation of characters.Experiments show that the secondary-loop segmentation algorithm can locate the position of each character more accurately.Secondly,HCCR-Inc BN model for handwritten Chinese character recognition based on convolutional neural network is proposed.In view of the problems of deep learning models with a large number of weight parameters,slow training convergence,and too large models to be embedded in portable devices,this thesis applies the Inception-v2 module,considers the depth and width of the network,and realizes the fusion and mapping of different features;In addition,the 1×1 convolution operation can reduce the number of parameters and feature mapping channels,the batch normalization algorithm and moving average algorithm are applied to optimize the model.The storage model only needs 26 MB,furthermore,through experiments on the public handwritten Chinese character dataset CASIA-HWDB1.1 and ICDAR2013,the comparison proves the recognition effectiveness of the HCCR-Inc BNmodel.In addition,Aiming at the problem that the existing offline handwritten simplified Chinese character data set cannot be effectively used for the recognition of traditional Chinese characters in ancient archives,on the basis of the preprocessing and segmentation operations of the image of Confucius ancient archives,through manually label each character picture and then enhance the data set,we create a new offline handwritten ancient Chinese character data set CMAD(Confucius' Mansion Archives Data).The existing data set contains 1,131 types of traditional Chinese characters and339,300 samples.At the end of this paper,we designed and developed an ancient document digitization system,which realized the whole process of “image uploading? image preprocessing ? image segmentation ? character image recognition ?generating electronic document”.In summary,the secondary-loop projection segmentation algorithm,the offline handwritten Chinese character recognition model HCCR-Inc BN,the handwritten traditional Chinese character data set,and the ancient document recognition system not only consolidate the foundation of the digitization of ancient archives,but also have great practical values for the protection,inheritance,and application of outstanding traditional culture.
Keywords/Search Tags:Image preprocessing, Image segmentation, Deep learning, Handwritten Chinese character recognition, Digitization of ancient documents
PDF Full Text Request
Related items