Font Size: a A A

Synset Induction Based On Multimodal Representation

Posted on:2022-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:G ChenFull Text:PDF
GTID:2558306914981469Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
Multimodal synset induction(i.e.,MSI)aims to discover synonym instances by visual and textual information,which plays an important fundamental role in downstream applications based on multimodal data.However,the study on synset induction for multimodal data has slowed the progress due to the few available data resources and the noise of multimodal data.The lack of available data resources and the noise of multimodal data are limited on the multimodal synset induction.Therefore,our works are listed below.First,to address the above-mentioned problems,we propose a multimodal encoder,named TrimSyn,to learn multimodal representation.In the TrimSyn,the visual and textual representation component uses a non-local attention mechanism to improve the relevance of the visual representation.The masking mechanism applies the results of the modalities interaction to filter out the irrelevant information within multimodal data.The gating component fuses the different modalities representation and generates the multimodal representation according to the different contributions of the conceptual semantics.Second,aiming to model the extreme imbalance of the scale of multimodal data,we propose an asymmetric multimodal encoder named A-TrimSyn.In addition to inheriting the non-local attention mechanism of TrimSyn,we design the multi-granularity embedding,which enhances the ability of the textual representation through different embedding training ways.We also propose an asymmetric masking component to filter out the noise within images.To apply the above two models,we propose a triplet neural network framework based on the use of triplet loss to learn the modal parameters of TrimSyn and A-TrimSyn.Then,we perform the clustering algorithm to generate synsets through the learned multimodal representation,in which terms of each synset refer to the same meaning.Third,to validate the effectiveness of our models,we construct the multimodal synset data named MMAI-Synset.Specifically,we obtain corresponding image-tag data from social media based on textual synsets to construct large-scale multimodal synset data.Furthermore,we evaluate the effectiveness of our proposed methods using three types of evaluation indicators:clustering evaluation indicators based on information entropy,clustering evaluation indicators based on permutations,and evaluation indicators based on the similarity of clustering results.Furthermore,we analyze the effectiveness of each component in TrimSyn and A-TrimSyn.Finally,we give a qualitative analysis of the result of multimodal synset induction,which further validates the effectiveness of the proposed methods.
Keywords/Search Tags:Deep Leanring, Masking and Gating Mechanism, Mutlimdodal Representation, Synset Indcution
PDF Full Text Request
Related items