Font Size: a A A

Category Alignment Adversarial Learning And Fine-Grained Supplementary Feature Learning For Cross-modal Retrieval

Posted on:2022-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:W Y WangFull Text:PDF
GTID:2518306524980299Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a hot topic in the current multimedia research field,cross-modal retrieval can meet the search needs of users in different media data in the Internet era.Compared with traditional single-modal retrieval,the difficulty of cross-modal retrieval is that the hetero-geneity of different modal data makes it difficult to compare them directly.With the de-velopment of feature representation technology in computer vision and natural language processing,the feature representation of different modal data is the key point of cross-modal retrieval research at this stage.Another key issue in cross-modal retrieval is how to associate different semantically related modal information.In addition to methodological research,the development of cross-modal retrieval also has an increase in task difficulty.In the past,cross-modal retrieval is based on coarse-grained data sets,and cross-modal retrieval tasks on fine-grained data sets are still to be studied.The most important differ-ence between fine-grained cross-modal retrieval and coarse-grained cross-modal retrieval is that the objects of fine-grained cross-modal retrieval are all sub-categories under the same category,such as between 200 sub-species of birds Retrieve each other.The diffi-culty of fine-grained cross-modal retrieval lies in the small differences between categories.Similar subcategories belonging to the same category may have similar global appearances and similar text descriptions.Compared with the traditional coarse-grained cross-modal retrieval,the fine-grained cross-modal retrieval puts forward higher requirements on the model.Aiming at the expansion of cross-modal retrieval problem methods and tasks,the work content of this paper is as follows:(1)Aiming at the traditional coarse-grained cross-modal task of image and text,this paper proposes an adversarial learning method based on category information alignment.The embeddings generated by category information are used to guide the alignment of different modal features in the common subspace,so that different modalities are aligned.Data features can be directly compared in the common subspace.At the same time,this method adopts a two-way training strategy in the training phase to improve the representa-tion ability of the model.Finally,the method is verified on four traditional coarse-grained cross-modal data sets,and the results show that this method can significantly improve the effect of cross-modal retrieval and is better than the existing methods.(2)In the cross-modal retrieval of fine-grained image text,this paper proposes a deep network method with fine-grained supplementary features.For images and texts,fine-grained feature extraction tools are used for feature extraction,and then frequent pattern mining algorithms are used to extract fine-grained supplementary features of images,and for text,a bidirectional encoder model is used to extract fine-grained text supplementary features.Finally,the category label loss,image text modal matching loss and central clus-tering loss are introduced in the process of establishing the common subspace to expand the gap between the classes in the subspace and reduce the difference within the class.This paper verifies the method on the existing fine-grained cross-modal retrieval data set and the traditional cross-modal data set,and the results show that it is superior to the existing fine-grained cross-modal retrieval method.
Keywords/Search Tags:Cross-modal Retrieval, Common Subspace, Adversarial Learning, Fine-grained Cross-modal Retrieval, Supplementary Feature Learning
PDF Full Text Request
Related items