Reasearch On Cross Modal Retrieval Algorithm Based On Weakly Supervised Localization | Posted on:2024-03-13 | Degree:Master | Type:Thesis | Country:China | Candidate:H C Yin | Full Text:PDF | GTID:2568307151467124 | Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree) | Abstract/Summary: | PDF Full Text Request | Due to the rapid growth of multimedia information and reduction of storage equipment costs,the comprehensive application of high-speed Internet has a large amount of data with various modes which has been generated.In order to further utilize these data information and quickly use one mode to generate similar information for other modes,cross modal retrieval emerged as the times require.Supervised learning is often used to improve the reliability of data retrieval in cross modal retrieval.In supervised learning,fully supervised learning is fully utilized in cross modal retrieval and has been well developed in computer vision,image retrieval and other fields.Fully supervision requires a large amount of labeled information which leads to high learning costs.But weakly supervised learning only requires incomplete,inexact and inaccurate label information to realize the same task and quality as full supervision.So it’s an important practical significance for the study of weak supervised learning.Firstly,inspired by the weakly supervised target localization technology.This paper proposes a deeply supervised cross-mode retrieval algorithm based on weak supervised localization.This algorithm skip the GAP layer through gradients and analyze the weights in image features using gradients to obtain more necessary image features during the research process to improve the accuracy of image feature extraction.ResNet residual network was introduced to solve the problem of gradient disappearance and explosion.Experimental studies on dataset Pascal Sentence suitable for cross-modal retrieval as well as the algorithm of this chapter.Compared with existing depth learning cross modal retrieval algorithms,MAP values were improved in both image retrieval text and text retrieval image tasks.It has proved that this algorithm has improved the accuracy of image feature extraction of different modes and has improved the accuracy of cross modal retrieval.Secondly,based on the differences between modal data and the characteristics of insufficient feature information mining.This paper proposes to use the semantic weakl supervision algorithm and use the weak supervision positioning to improve the image feature extraction ability.The algorithm mines the different labels of each mode and extracts as much potential semantic information in the mode as possible.The optimal hash code is optimized by consistent modeling of modal feature information and semantic label information.The encoder part is used to learn the optimized hash function to reduce the influence of different mode heterogeneity.Then the dimension reduction operation is performed through the encoder and finally the reduced features are apped to the original space through the decoder.Then the decoder is used to complete the reconstruction process to minimize the reconstruction loss and ensure the consistency of the learned hash function between features and labels.Compared with the existing cross-modal hash retrieval algorithm,this algorithm improves the accuracy of cross-modal retrieval.Finally,in order to solve the issue of inaccurate image feature extraction and decreased retrieval accuracy caused by retaining a single semantic coupling feature.This paper proposes a weakly supervised attention mechanism adversarialt cross modal retrieval algorithm based on the generative adversarial network and inspired by the weakly supervised attention mechanism.This algorithm combines the adversarial cross modal retrieval algorithm with weakly supervised localization technology.This paper utilizes generative adversarial networks to set up modal classifiers and uses feature projectors for adversarial learning.Then the imbalance of information between modalities can be better eliminated and more consistent representations can be produced.Experimental results show that the weakly supervised attention mechanism can better preserve and extract regionally important information to make cross modal retrieval more accurate. | Keywords/Search Tags: | weakly-supervised localization, hash learning, cross-modal retrieval, attention mechanism, autoencoder, adversarial network | PDF Full Text Request | Related items |
| |
|