Font Size: a A A

Research On Visual Language Model For Historical Mongolian Document Images Retrieval

Posted on:2018-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:X GuoFull Text:PDF
GTID:2348330515452368Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of digital technology,in Inner Mongolia Autonomous Region,more and more historical Mongolian documents have been converted into digital images to protect them as long as possible.But,the scanned images lack indexing,which results in content-based retrieval impossible.Under the circumstance,the utilization and spread of the historical Mongolian documents is restricted.Therefore,this dissertation devotes to research the retrieval technology for the historical Mongolian document images,which is convenient for mining and utilizing these historical Mongolian documents.In the field of image retrieval,bag-of-visual-words(BoVW)model is attracted more attention in recent years.However,BoVW has two major drawbacks.The first drawback is that there is lack semantic information between visual words,which leads to the problem of semantic gap.The second drawback is that visual words are independent with each other,which results in ignoring the spatial information between neighboring visual words.In this dissertation,a solution has been proposed to handle the above-mentioned two drawbacks.The details of the proposed solution are as follows:(1)Aiming at the lack of semantic information between visual words in the BoVW framework,visual language model is proposed.Firstly,the scanned Mongolian Kanjur images are segmented into individual word images.Secondly,local descriptors(i.e.visual words)are extracted from each word image.And then,each word image can be represented as a probability distribution of visual words along its writing direction.Therein,a smoothing scheme is used to handle the problem of zero probability.When a query keyword image is provided,query likelihood model(QLM)is utilized to calculate similarity between the query keyword image and each word image.Finally,a ranking list of word images can be formed.(2)A spatial visual language model is proposed to represent word images by combining the spatial information with the semantic information between visual words.Firstly,each word image is divided into several sub-regions with equal sizes along rows and columns.According to the Mongolian writing style,a certain kind of division manner has been determined.Secondly,a visual language model is constructed from each sub-region.In the image matching phase,only the corresponding sub-regions of the two word images are matched each other.Finally,the similarity of the two word images is the sum of the similarities of these sub-regions.
Keywords/Search Tags:Historical Mongolian Document, Document Image Retrieval, Bag of Visual Words, Visual Language Model, Spatial Pyramid Matching
PDF Full Text Request
Related items