Synset Induction Based On Multimodal Representation

Posted on:2022-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:G Chen

Full Text:PDF

GTID:2558306914981469

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

Multimodal synset induction(i.e.,MSI)aims to discover synonym instances by visual and textual information,which plays an important fundamental role in downstream applications based on multimodal data.However,the study on synset induction for multimodal data has slowed the progress due to the few available data resources and the noise of multimodal data.The lack of available data resources and the noise of multimodal data are limited on the multimodal synset induction.Therefore,our works are listed below.First,to address the above-mentioned problems,we propose a multimodal encoder,named TrimSyn,to learn multimodal representation.In the TrimSyn,the visual and textual representation component uses a non-local attention mechanism to improve the relevance of the visual representation.The masking mechanism applies the results of the modalities interaction to filter out the irrelevant information within multimodal data.The gating component fuses the different modalities representation and generates the multimodal representation according to the different contributions of the conceptual semantics.Second,aiming to model the extreme imbalance of the scale of multimodal data,we propose an asymmetric multimodal encoder named A-TrimSyn.In addition to inheriting the non-local attention mechanism of TrimSyn,we design the multi-granularity embedding,which enhances the ability of the textual representation through different embedding training ways.We also propose an asymmetric masking component to filter out the noise within images.To apply the above two models,we propose a triplet neural network framework based on the use of triplet loss to learn the modal parameters of TrimSyn and A-TrimSyn.Then,we perform the clustering algorithm to generate synsets through the learned multimodal representation,in which terms of each synset refer to the same meaning.Third,to validate the effectiveness of our models,we construct the multimodal synset data named MMAI-Synset.Specifically,we obtain corresponding image-tag data from social media based on textual synsets to construct large-scale multimodal synset data.Furthermore,we evaluate the effectiveness of our proposed methods using three types of evaluation indicators:clustering evaluation indicators based on information entropy,clustering evaluation indicators based on permutations,and evaluation indicators based on the similarity of clustering results.Furthermore,we analyze the effectiveness of each component in TrimSyn and A-TrimSyn.Finally,we give a qualitative analysis of the result of multimodal synset induction,which further validates the effectiveness of the proposed methods.

Keywords/Search Tags:

Deep Leanring, Masking and Gating Mechanism, Mutlimdodal Representation, Synset Indcution

PDF Full Text Request

Related items

1	Research On The Representation Learning Of Complex Heterogeneous Data
2	Text Feature Representation And Sentiment Analysis Based On Deep Neural Network
3	Research On Single-channel Speech Enhancement Method Based On Deep Neural Networks And Time-frequency Masking
4	The Study Of Masking Effect On Chinese
5	Design And Implementation Of Data Masking System For Relational Database
6	Research On The Defense Mechanism Of Audio Adversarial Attack
7	Research On Improved POI Recommendation Model Based On Gating And Mixture Of Experts Model
8	Gating Mechanism For Real-time Network I/O Requests Based On Para-virtualization Virtio Framework
9	Speech Masking Based On Artificial Synthesis
10	Optimization And Design Of Clock Gating Based On DSP