Font Size: a A A

Research On Cross-modal Feature Extraction And Fast Retrieval Algorithm For Geo-images

Posted on:2020-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:X J FuFull Text:PDF
GTID:2428330590471701Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development in sensor technology and mobile devices brings a flourish of images which can be a great opportunity for image retrieval.It will get rough results and low accuracy when low-level visual features such as color,texture and shape are retrieved.Therefore,the semantic-based image retrieval method is gradually becoming popular.As one of semantic information,label information is usually used in the existing image retrieval applications.Image tags often retrieve target images through manual labeling or text information based on image title.However,such methods require a lot of manual experience.In addition,the forms of information carriers are diversified.In different information carriers,there are differences in the representation of the same semantics.Computers learn with single mode may cause biased of the semantics.This article uses the location information as bias to narrow the semantic extraction scope and combines with images and texts and introduces the self-attention to both modals to interact them in the process of features extraction.This process combines global and local information,extracts more topic semantic features.Using the text and image features extracted from the cross-modal inference framework to train the tree model,the different modal data are clustered in the same class cluster and sort by cosine distance.The Tree-Kmeans algorithm is used to speed up the labeling.The main contents of our work are as follows:1.Cross-modal feature extraction algorithm for geo-imagesA cross-modal reasoning framework was designed to explore more accurate semantic information in geo-images which combined with previous experience in deep learning.The Residual Network of 50 layers(ResNet-50)was selected as the basic network structure to extract the deep image features;combines the TF-IDF vector considering the text frequency information in the global and local with embedding vector embedded semantic information;map the image vector and text vector to the same vector space,make the self-attention mechanism acts on both image and text vector to interact them in the process of feature extraction,and infer jointly to obtain the dependent relations between words,with the help of location information to do auxiliary training,and then extract more accurate semantic information characteristics2.Fast retrieval algorithm for geo-imagesUsing the feature extraction ability of the cross-modal feature extraction algorithm to extract the high-dimensional feature vectors of images and words,then establish a tree model,and the graph vector and the word vector are recursively clustered together in the form of a tree.In the matching process,cosine distance is used to measure to find the nearest subclass,and with the help of the Tree clustering algorithm(Tree-Kmeans)to speep up and complete the final annotation.
Keywords/Search Tags:geo-image, cross-modal, self-attention, Tree-Kmeans
PDF Full Text Request
Related items