Korean ancient books are an important carrier for recording historical,political,cultural,and other information about the development of the Korean people over thousands of years.Advanced character detection approaches and technologies for text image,can promote such processes of the regenerative restoration and OCR about Korean ancient books digitization progress.Up to present,large progress has been made in domestic digitization of Chinese ancient books,especially for Chinese,Tibetan,Mongolian and Yi ancient books,but the process of Korean ancient is seriously lagging,and there are very few research literatures on the character detection methods of Korean ancient books.The character detection research of Korean ancient books faces various challenges,such as different aspects of font size and shape,writing rule and composing type in Chinese-Korean-character-mixed text image,besides various font size in same text row or column,degradation during preservation,and so on.Therefore,an improved HRCenter Net model for the detection of Korean ancient books is proposed,and by this model,a prototype system for detecting and segmenting ancient Korean books is designed and implemented.First,aiming at the problem of accurate detection of charact ers in Korean ancient book images,this dissertation introduces the Involution operator into the residual structure of HRCenter Net to improve the ability of HRCenter Net baseline model to extract global features of text.On this basis,a channel attention mechanism ECA(Efficient Channel Attention)is introduced to evaluate the importance of different resolution channels.In this way,the constructed IENeck module based on the Involution operator and ECA can not only automatically learn the importance of each chan nel through attention,but also improve the local receptive field of the model through the Involution operator.Then,this dissertation verifies the feasibility of introducing the IENeck module to obtain more accurate detection results of Korean ancient books through experiments.Secondly,in order to solve the problem of insufficient annotation datasets of Korean ancient books,the pre-trained model on the Chinese ancient books dataset was transferred to the task of Korean ancient book characters detection.The main basis for introducing transfer learning is that both Korean and Chinese characters are block characters.Although Chinese characters are pictographs and Korean characters are alphabetic characters,the two characters have certain structural similarities.Therefore,the character structure features extracted from the source domain of the pre-training model are transferred to the target domain of Korean character detection,which is helpful to improve the performance of the target domain model without negative transfer.The experimental results of Korean ancient characters detection show that when the IOU thresholds are 0.7 and 0.8,respectively,the precision,recall and F1 index of the introduced IENeck model are significantly better than the baseline model and other typical target detection models.After further adopting the transfer learning pre-training model,the performance indicators of the detection model have been significantly improved,and the higher the threshold,the better the improvement of the model.In addition,the experimental results of Chinese ancient characters detection also show that the introduction of the IENeck module improves the accuracy of Chinese ancient characters detection.Based on the above experimental results,the detection method of Korean ancient books proposed in this dissertation can accurately detect the boundary position of characters and meet the requirements of the detection task of Korean ancient books.The prototype system for the detection and segmentation of Korean ancient books is designed and implemented,and the core technology adopts the improved HRCenter Net model proposed in this dissertation.The main modules of the system design meet the basic functional requirements of practical applications.The system test results show that the prototype system can realize the detection and segmentation of Korean ancient books,which has good practical and popularization value. |