Font Size: a A A

Segmentation-free Word Spotting In Offline Handwriting Documents Using Two-directional Dynamic Time Warping

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YaoFull Text:PDF
GTID:2308330485970919Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of digital technology, an increasing number of paper documents, especially some precious historical documents are converted to image files for preservation, transmission and access. Thus how to retrieve the words in the document images becomes an urgent issue. There are large varieties of characters and arbitrary of handwriting styles in manuscripts. And it becomes a challenging task to spot the keywords in the manuscripts. In this paper, we propose a two-directional dynamic time warping based segmentation-free handwriting keyword spotting method and discuss about the detection of candidate areas, the features of characters and the matching methods. The major contributions of this work are as follows:Firstly, we use the Canny edge detector to extract the outlines of the documents and join the broken strokes by morphological closing. Then we extract the connected components from both the edge images and the binarized images. We generate the new candidate areas by splitting and merging the connected components. In this way, we can reduce the errors caused by words segmentation and make our method adapt to characters with different sizes.Secondly, we adopt the HoG descriptors to extract features and represent the features by the bag-of-words model. The HoG descriptors can retain the stroke direction information, and most of the stroke directions are distributed in horizontal, vertical and diagonal directions, thus we use the HoG descriptors with four orientations to extract features from the patches in the candidate areas. In order to cluster the similar features of characters into the same class to tolerate local variations, we utilize the bag-of-words model to represent the HoG features.Thirdly, we propose a two-directional dynamic time warping method to adapt the strokes’ offsets in horizontal and vertical directions. We connect the features extracted from the same columns into the vertical feature vectors, and use the dynamic time warping method to calculate the horizontal similarity between the two words. We get the vertical similarity in the similar way. We get the final results by combining the similarities of the horizontal and vertical directions of the two words.The proposed method has been tested on the George Washington dataset, Bentham dataset and CASIA-HWDB 2.1 dataset. Compared with other methods, our method achieves the competitive performance. The experimental results show that the proposed segmentation-free method can spot the words effectively.
Keywords/Search Tags:handwritten keyword spotting, segmentation-free, HoG, two-directional dynamic time warping, bag of words
PDF Full Text Request
Related items