Font Size: a A A

Research On Multi-scene Ancient Chinese Text Recognition

Posted on:2021-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:K L WangFull Text:PDF
GTID:1488306461963499Subject:Graphic communication engineering
Abstract/Summary:PDF Full Text Request
Character is the product of human civilization development,and is the abstracted and concentrated information carrier.It breaks the limitations of language in time and space with reading and writing functions,and develops together with human history.Before the digital preservation equipment is created,character was preserved on animal bones,metal materials,stones,bamboo,wood,paper,etc.by means of carving,bronze smelting,brush or hard-pen writing.With the development of imaging technology,character can be propagated with images,such as photos,scanned images,etc.Originating from document text recognition,text recognition technology are still extensive researched on handwritten text and natural scene text.At present,most of the text recognition research objects are mainly in English.Compared with English,Chinese are quite different,such as object appearance,stroke combination law and character category number.Special research on Chinese text recognition is of great significance.Among them,there are few researches on the recognition of ancient Chinese character,which are difficult to recognize.Firstly the chirography and font type of ancient Chinese are varied,the distribution scene is extremely complex,and the character category number is extremely large,which require recognition methods to possess good feature extraction and classification capabilities,and possess powerful robustness facing complex and changable scenes.Excluding large number of category number,diverse distribution scenes and fonts,diverse writing habits of people from different dynasties also furtherly increase the recognition difficulty.Therefore,we propose researches on Multi-scene Ancient Chinese Text Recognition(MACR).Based on deep learning method which has achieved great success in many machine vision fields,the main research works of this article include three parts:(1)Targeting at the deep learning solution for multi-scene ancient Chinese text recognition,a multi-scene ancient Chinese text recognition dataset are established firstly,which includs the artificially synthesized training set and testing set collecting from real scenes.The testing set samples cover most of the multi-scene ancient chinese text,font types and distribution scenes,such as these chirography types: Oracle bone inscriptions,Bronze inscriptions,Seal characters,Official script,Cursive script,Regular script and Running script and these distribution scenes: calligraphy and epitaph,plaque and cliff.The training set samples cover 3755 characters,a variety of font types and distribution scens.The dataset feature distribution also is analyzed.(2)Recognition research based on the dataset obtained from above is conducted.In order to better compare the performance of the research methods,human subjective recognition experiment is fistly performed,and the average recognition 52.98%is obtained.Then several classic and excellent convolutional neural networks are performed as basic recognition method.Comparing with the result from human subjective recognition,the increased recognition accuracy proves that deep learning method is suitable and practical to multi-scene ancient Chinese text recognition task and the synthesized training set is effective.According to the basic recognition method and the result law analysis,that there are positive correlation between confidence and recognition accuracy,confidence based Multi-Model Ensemble MACR method(MME)are proposed.Compared with the basic method,the recognition accuracy from MME has been greatly improved.(3)Furtherly recognition researches based on the dataset obtained from above are conducted.Based on basic recognition method above and the analysis to dataset,it is found that there are inconsistences between training and testing set in character category number and data distribution.The general deep learning method assumes that the data are independent and identically distributed.In MACR task of this paper,domain shift affects the recognition performance.A domain adaptive and cross-domain fusion based MACR method is proposed,which includs the alignment of domain deep feature and class center deep feature.In order to alleviate negative transfer result from the inconsistence of category number,cross-domain fusion is proposed to update the target domain with full-class,and enhance the source domain with high-confidence pseudo-label samples.Compared with the basic method and the MME method,the recognition results are further improved.In this paper,some researches and discussion of multi-scene ancient Chinese text recognition task are carried out,and a large number of experiments are conducted including human subjective recognition experiment,basic method recognition experiment,MME method recognition experiment and domain adaptation and cross-domain fusion based method recognition experiment,the experiment results prove the effectiveness of these method,and the MACR problem are solved to a certain extent.
Keywords/Search Tags:Multi-scene ancient Chinese text, Text recognition, Multi-model ensemble(MME), Domain adaptation
PDF Full Text Request
Related items