Font Size: a A A

Research On A Fine-grained Cross-media Retrieval Method Based On Adversarial Networks

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:J HongFull Text:PDF
GTID:2518306752997239Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cross-media retrieval is to return the results of specific media types corresponding to the query of any media type.The previous cross-media retrieval algorithms are coarse-grained.Therefore,the retrieval results are in a wide range and not accurate.However,the fine-grained cross-media retrieval algorithms can distinguish the subtle differences among subcategories and return the objects divided precisely.Therefore,fine-grained cross-media retrieval can also be called subcategory cross-media retrieval.There are two major challenges to be solved in finegrained cross-media retrieval:(1)It is difficult to learn the subtle differences between finegrained subcategories under the weak supervision of category labels,which greatly affects the accuracy of retrieval.(2)The heterogeneity gap(media gap)reflecting on the inconsistent distribution and feature representations of different media makes it difficult to compare the similarity of these cross-media data directly.This paper first reviews the related research at home and abroad in recent years.Based on the introduction of several mainstream fine-grained technologies and cross-media retrieval algorithms,the self-attention mechanism and entity labels are introduced to learn the discrimination among fine-grained subcategories.And then use the generative adversarial network to narrow the heterogeneity gap among cross-media data.The main contributions of are as follows:This dissertation proposes a self-attention and generative adversarial based fine-grained cross-media retrieval.For the problem that there exists a "heterogeneity gap" among crossmedia data,an improved generative adversarial network(GAN)is proposed.The network consists of a feature generator(G)and a media discriminator(D).Since the self-attention mechanism can locate the discriminative areas among subcategories,this paper uses two selfattention mechanism based feature extractors to address cross-media data.Then the features of four media data are fed into the common feature learning module.With the assistance of the D,the media gap will be narrowed and the common feature space is obtained.Finally,the cosine distance is used for similarity measurement to conduct fine-grained cross-media retrieval.The experiments conducted on various datasets show the effectiveness of the self-attention mechanism for learning fine-grained features.This dissertation proposes an entity-level common feature space based fine-grained crossmedia retrieval algorithm.Since the previous cross-media retrieval algorithms simply use the coarse-grained semantics of category labels,resulting in the ambiguous meaning of features in the common space.Considering the distinguishability,relevance and scalability of entities,a common feature space based on the entity is proposed.First,for the problem that video data contains noise frames,a spatial clustering based noise frame removal algorithm is proposed to obtain purer video data.Then the entity-level common feature space is obtained under the constraints of the media discriminator and entity labels.Following the above operations,the space can combine the coarse-grained features of category labels and fine-grained features of entity labels together.Finally,the cosine distance is used to measure the similarity of crossmedia data.Comparative experiments with many cross-media retrieval algorithms show that the proposed algorithm performs better,especially in video-related retrieval tasks.A fine-grained cross-media retrieval system based on GAN is designed and implemented.The system includes five main modules: operation settings module,data preprocessing module,results display module,analysis module,and results saving module.In conclusion,it realizes the function of "input visualization-cross media retrieval-output visualization-performance analysis".
Keywords/Search Tags:Fine-Grained Cross-Media Retrieval, Self-Attention Mechanism, Entity-Level Common Feature Space, Generative Adversarial Network
PDF Full Text Request
Related items