Research And Application Of Multimodal Learning For Heterogeneous Feature Fusion

Posted on:2021-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:J R Chen

Full Text:PDF

GTID:2428330647451038

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently,as the acquisition of multimodal data has become easier,large-scale multimodal datasets have facilitated the study of multimodal learning.Meanwhile,the development of deep learning has helped multimodal learning to make a huge leap forward.Multimodal learning is a common method of deep learning technology applied in the field of computer vision,such as cross-modal image recognition,multimedia content analysis,and so on.The universality of multimodal data and the development of deep learning technology make the study of deep multimodal learning of great theoretical significance and practical value.Multimodal feature fusion,as one of the original topics in multimodal learning,is also the most widely used research direction.This paper focuses on deep feature fusion learning based on multimodal data and develops two research works.The details can be concluded as follows:The first work is based on multimodal feature fusion,and a novel adversarial multimodal metric learning method is proposed.This method fully considers the relationship between and within the multimodal feature view.The intra-view metric aims to confuse the current metric by synthesizing hard negative samples,thereby improving the discrimination ability of each specific view.The inter-view metric aims to eliminate inconsistent views and generate challenging inter-view samples,to mine view-sharable relation among multiple views.This method combines two adversarial modules(intraview and inter-view)to form a final feature representation for subsequent tasks.This method improves the performance of challenging multimodal data tasks.Extensive experiments on various benchmark multimodal datasets have demonstrated the effectiveness of this method.Moreover,an RGB-D recognition task and a Face-Caricaturerecognition task are introduced to show the excellent performance of this method on deep features.The second work is based on multimodal video data,and a multimodal feature fusion method for celebrity video identification is proposed.The structure of this method consists of several single-modal multi-layer perceptron(MLP)modules and a multimodal feature fusion module.This method preprocesses the multimodal data generated by the video and then trains multiple MLP submodels using different modal data.The features generated by multiple submodels are weighted and fused to complete the final identification task.The model can achieve a better identification effect by combining the information of different modals through the feature fusion module.This method can overcome the problems of a huge number of videos and complicated information and is suitable for multimodal data classification or identification tasks.Experiments on large-scale celebrity video datasets show that the feature fusion strategy of this method effectively improves the model's performance for video person identification.Without multi-model integration,the m AP(mean Average Precision)of a single model reaches89.52 %.

Keywords/Search Tags:

Multi modal Machine learning, Multimodal Fusion, Multimodal Feature Fusion, Deep learning

PDF Full Text Request

Related items

1	Research On Machine Multimodal Perception
2	Research On Multimodal Learning Method Based On Position Attention Module
3	Research On Deep Multimodal Fusion Techniques And Time Series Analysis Algorithms
4	Research On Feature Fusion Of Multimodal Data Based On Deep Learning
5	Research On Multimodal Data Processing Algorithm Based On Deep Learning
6	Multi-modal Emotion Recognition Based On Deep Learning
7	The Research On Multimodal Fusion Emotion Recognition Based On Deep Learning
8	Research On Multimodal Feature Fusion For Human Fall Detection
9	Research On Human Action Recognition Based On Multimodal Information Fusion
10	Application For Homologous And Heterogeneous Multimodal Data Based On Multiple Deep Learning Blocks