Multimodal Scene Classification Algorithm Based On Self-attention

Posted on:2023-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chang

Full Text:PDF

GTID:2568306836472454

Subject:Electronic and communication engineering

Abstract/Summary:

Scene classification aims to classify the scene based on the background information of the surrounding environment,which is an important part of the environmental monitoring task.In view of the fact that the real environment scene is prone to the occurrence of multiple events at the same time to generate interference information,which will affect the accuracy of scene classification.In this paper,a multimodal scene classification system based on self-attention mechanism is proposed,which combines early fusion and attention mechanism to double fusion of audio-visual features,thus effectively improving the performance of scene classification.The main work of this paper is as follows:(1)In view of the single and one-sided limitations of unimodal information,this paper uses attention mechanism to perform multimodal fusion of audio-visual features.Therefore,a scene classification system based on attention mechanism for dualmodal mutually assisted decision is proposed,and the experimental results show that the performance of scene classification can be effectively improved after the mutual learning of information between the modalities.(2)On the basis of previous work,an early fusion is added for audio-visual features,and then the self-attention mechanism is used for deep fusion,so as to study the change of scene classification performance after dual fusion learning.Experimental results show that the model has better performance in capturing multimodal features,and has achieved good results in Dcase Challenge 2021 Task 1B competition.Experimental results are evaluated by classification accuracy.The multimodal scene classification system based on self-attention mechanism proposed in this paper achieved the accuracy of 90.26% in TAU Urban Audio Visual Scenes 2021 dataset,which achieved a significant improvement in scene classification performance compared with the baseline system.

Keywords/Search Tags:

Multimodal Fusion, Attention, Audio-Visual Scene Classification, Deep Learning, Auxiliary Learning

Related items

1	Multimodal Scene Classification Based On Audio Image Collaboration
2	Audio-Visual Joint Human Action Recognition On Deep Leraning
3	Visual Attention Models Based On Deep Learning For Scene Classification
4	Audio-Visual Multi-Modal Fusion Approach Research And Application
5	Research On Semantic Analysis And Understanding Of Multimodal Video
6	Research On Attention Based Image Classification With Deep Learning
7	Research On Multimodal Emotion Recognition Combining Audio And Text Based On Deep Learning
8	Research On Audio Scene Classification Method Based On Attention Mechanism And Deep Supervision
9	Research On Multimodal Deep Learning Algorithm Based On Attention Mechanism
10	Research On Multimodal Emotion Analysis Method Based On Deep Learning