Font Size: a A A

Research On Word Spotting Technology In Handwritten Historical Document Images

Posted on:2022-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhaoFull Text:PDF
GTID:2518306563978959Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In order to help people like archaeologists,historian and internet censors to retrieve regions of interest from documents,using deep learning and other technologies for fast,real-time and accurate word spotting in documents(such as handwritten historical documents)is an urgent need for relevant personnel,which has a wide range of application value in the fields of historical literature review,visual search and image retrieval.However,image annotation of handwritten historical documents is difficult,time-consuming and laborious,resulting in a serious lack of training data,which is insufficient to meet the training needs of deep learning models.In addition,we face more challenges in handwritten historical document images due to various writing style,changeable visual appearance,and uneven background.Moreover,there have special characteristics,such as dense words distribution and overlapping strokes,which could seriously affect the locating and matching performance of deep learning methods.Therefore,this thesis researches on the problems of too small dataset of handwritten historical document images,low locating accuracy of word targets with different sizes,and complex processing process of multi-stage methods.From the perspective of improving model's locating and matching ability,this thesis proposes two end-to-end single-stage research ways for segmentation-free QbS(Query-by-String)word spotting.In order to achieve better retrieval performance,based on multi-task learning mechanism,word locating and word matching tasks are completed simultaneously through a unified network.The main contributions and innovations of this thesis are summarized as follows:(1)A segmentation-free QbS word spotting method based on direct regression is proposed.This method mainly improves the locating and matching ability of the model from the three aspects of data,network architecture and loss function.At the data level,two methods of data augmentation,IPA(In-place augmentation)and FPA(Full-page augmentation),are adopt to make up for the serious lack of training data.At the network architecture level,residual network,as a backbone network,is combined with pyramid network structure to enhance the feature extraction ability of the proposed model.The multi-scale feature fusion strategy is proposed to improve the prediction ability of word targets with different sizes.At the loss function level,three loss functions for the three tasks adapted to this scene is adopted and a weighted loss function is designed to cope with the challenges of dense words distribution and overlapping strokes in document images.Finally,the comparison experiments,ablation experiments and robustness experiments are performed on three public datasets,respectively.The experimental results verify the effectiveness of the proposed method and the optimality of the selection strategy of each module.(2)A segmentation-free QbS word spotting method based on attention mechanism is proposed.This method mainly designs the spatial attention mechanism and the scale attention mechanism to guide the model to focus on the regions with more word information.Among them,the spatial attention mechanism helps the model to deal with the word regions with various writing style and changeable visual appearance,thus improving the model's ability to distinguish this region.The scale attention mechanism helps the model deal with word targets of different sizes.Finally,the superior retrieval performance of the model is verified by comparison experiments and ablation experiments.(3)The software prototypes of word spotting based on WEB platform and Windows client are developed.In this way,the retrieval results of word spotting can be visualized and the specific tasks of word spotting can be understood intuitively.
Keywords/Search Tags:Word spotting, Word embedding, Data augmentation, Multi-scale feature fusion, Multi-task learning, Attention mechanism, Deep learning
PDF Full Text Request
Related items