Font Size: a A A

Research On Segmentation-free Word Spotting Of Handwritten Ancient Documents

Posted on:2019-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z H QiuFull Text:PDF
GTID:2428330566486084Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In the research of ancient documents,the documents should be digitalized and stored as images by scanning.As the amount of data growing,a searching system needs to be built.However,most documents were written by hand.Traditional way of indexing the words from documents requires a segmentation preprocessing.As the casualty of handwritten characters,it's not easy to segmentation the words correctly.Thus,method based on segmentation-free word spotting becomes a tendency of research.At present,the difficulty of segmentation-free word spotting lies in the large variance of handwritten characters by different people and the length of different words.To avoid the error of segmentation and raise the precision of index,we do some research based on segmentation free method:(1)A feature based on Multi-Layer Convolutional Network is proposed for raising up the precision.The framework of neural network is based on the one proposed by Visual Geometry Group(VGG).We use it to extract convolutional features to improve the precision of the system.During training and indexing,the system extracts Multi-Layer Convolutional Features of index images and negative samples.With a trained E-SVM sclassifier,the system can yield the score of the region covered by a sliding window.The method is tested on a twenty-page dataset,containing 4860 words.The system reaches the mAP of 57.6%,6.8% higher than that of HOG feature.(2)A multi-scale classifier is proposed to improve the precision of short words and address the problem of scale variance.The system extracts feature under different scales and trains 3 E-SVMs classifiers by Stochastic Gradient Descent(SGD)algorithm.Non-Maximum Suppression is used to eliminate some areas overlapped with each others and select the candidate region of high score.This method improves the mAP of words in length of less than 5 effectively,reach the rate of 52%,which is 2.7% higher than that without multi-scale classifiers.Combining with Multi-Layer Convolutional Features,multi-scale classifiers are trained for indexing.Consequently,the mAP rate of 58.7% is achieved.
Keywords/Search Tags:Machine Learning, Image Processing, Segmentation-free
PDF Full Text Request
Related items