Research On Visual Language Model For Historical Mongolian Document Images Retrieval

Posted on:2018-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:X Guo

Full Text:PDF

GTID:2348330515452368

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of digital technology,in Inner Mongolia Autonomous Region,more and more historical Mongolian documents have been converted into digital images to protect them as long as possible.But,the scanned images lack indexing,which results in content-based retrieval impossible.Under the circumstance,the utilization and spread of the historical Mongolian documents is restricted.Therefore,this dissertation devotes to research the retrieval technology for the historical Mongolian document images,which is convenient for mining and utilizing these historical Mongolian documents.In the field of image retrieval,bag-of-visual-words(BoVW)model is attracted more attention in recent years.However,BoVW has two major drawbacks.The first drawback is that there is lack semantic information between visual words,which leads to the problem of semantic gap.The second drawback is that visual words are independent with each other,which results in ignoring the spatial information between neighboring visual words.In this dissertation,a solution has been proposed to handle the above-mentioned two drawbacks.The details of the proposed solution are as follows:(1)Aiming at the lack of semantic information between visual words in the BoVW framework,visual language model is proposed.Firstly,the scanned Mongolian Kanjur images are segmented into individual word images.Secondly,local descriptors(i.e.visual words)are extracted from each word image.And then,each word image can be represented as a probability distribution of visual words along its writing direction.Therein,a smoothing scheme is used to handle the problem of zero probability.When a query keyword image is provided,query likelihood model(QLM)is utilized to calculate similarity between the query keyword image and each word image.Finally,a ranking list of word images can be formed.(2)A spatial visual language model is proposed to represent word images by combining the spatial information with the semantic information between visual words.Firstly,each word image is divided into several sub-regions with equal sizes along rows and columns.According to the Mongolian writing style,a certain kind of division manner has been determined.Secondly,a visual language model is constructed from each sub-region.In the image matching phase,only the corresponding sub-regions of the two word images are matched each other.Finally,the similarity of the two word images is the sum of the similarities of these sub-regions.

Keywords/Search Tags:

Historical Mongolian Document, Document Image Retrieval, Bag of Visual Words, Visual Language Model, Spatial Pyramid Matching

PDF Full Text Request

Related items

1	Research On Retrieval Of Historical Mongolian Document Images
2	Research On Image Classification Of Optimized Spatial Pyramid Matching Model
3	Research Of Mongolian Historical Document Recognition
4	Spatial Contextual Information Based Image Retrieval
5	The Research Of Image Retrieval Technology Based On Hashing Methods
6	Research On Deep Learning For Historical Mongolian Document Images Retrieval
7	Research On 3D Model Retrieval Based On Supervised Bag-of-Visual-Words Framework
8	Research On Scene Classification Technologies With The Local Context Feature And Spatial Pyramid Model
9	Contextual Visual Feature Representation And Application
10	Study Method On Extraction Building Of High Resolution Image Based On The Bag Of Words Model