Research On Cross-modal Feature Extraction And Fast Retrieval Algorithm For Geo-images

Posted on:2020-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:X J Fu

Full Text:PDF

GTID:2428330590471701

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rapid development in sensor technology and mobile devices brings a flourish of images which can be a great opportunity for image retrieval.It will get rough results and low accuracy when low-level visual features such as color,texture and shape are retrieved.Therefore,the semantic-based image retrieval method is gradually becoming popular.As one of semantic information,label information is usually used in the existing image retrieval applications.Image tags often retrieve target images through manual labeling or text information based on image title.However,such methods require a lot of manual experience.In addition,the forms of information carriers are diversified.In different information carriers,there are differences in the representation of the same semantics.Computers learn with single mode may cause biased of the semantics.This article uses the location information as bias to narrow the semantic extraction scope and combines with images and texts and introduces the self-attention to both modals to interact them in the process of features extraction.This process combines global and local information,extracts more topic semantic features.Using the text and image features extracted from the cross-modal inference framework to train the tree model,the different modal data are clustered in the same class cluster and sort by cosine distance.The Tree-Kmeans algorithm is used to speed up the labeling.The main contents of our work are as follows:1.Cross-modal feature extraction algorithm for geo-imagesA cross-modal reasoning framework was designed to explore more accurate semantic information in geo-images which combined with previous experience in deep learning.The Residual Network of 50 layers(ResNet-50)was selected as the basic network structure to extract the deep image features;combines the TF-IDF vector considering the text frequency information in the global and local with embedding vector embedded semantic information;map the image vector and text vector to the same vector space,make the self-attention mechanism acts on both image and text vector to interact them in the process of feature extraction,and infer jointly to obtain the dependent relations between words,with the help of location information to do auxiliary training,and then extract more accurate semantic information characteristics2.Fast retrieval algorithm for geo-imagesUsing the feature extraction ability of the cross-modal feature extraction algorithm to extract the high-dimensional feature vectors of images and words,then establish a tree model,and the graph vector and the word vector are recursively clustered together in the form of a tree.In the matching process,cosine distance is used to measure to find the nearest subclass,and with the help of the Tree clustering algorithm(Tree-Kmeans)to speep up and complete the final annotation.

Keywords/Search Tags:

geo-image, cross-modal, self-attention, Tree-Kmeans

PDF Full Text Request

Related items

1	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
2	Image-text Translation Based On Cross-modal Related Semantics And Attention Mechanism
3	Cross-modal Retrieval Based On Deep Model Learning
4	Jointly Cross-and Self-modal Graph Attention Networks For Query-based Moment Retrieval In Videos
5	Deep Attention Based Cross-Modal Person Search Via Natural Language Descriptions
6	An Optimized Approach To Cross-Modal Retrieval Based On Multi-level Attention Mechanism
7	A Self Attention Guided Network For Cross-modal Matching
8	Cross-modal Retrieval Method Based On Dependence Relationship Attention And Social Information
9	Attention-aware Deep Cross-modal Hashing
10	Audio-Video Based Cross-modal Speaker Retrieval And Recognition