Font Size: a A A

Multimodal Scene Classification Algorithm Based On Self-attention

Posted on:2023-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChangFull Text:PDF
GTID:2568306836472454Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Scene classification aims to classify the scene based on the background information of the surrounding environment,which is an important part of the environmental monitoring task.In view of the fact that the real environment scene is prone to the occurrence of multiple events at the same time to generate interference information,which will affect the accuracy of scene classification.In this paper,a multimodal scene classification system based on self-attention mechanism is proposed,which combines early fusion and attention mechanism to double fusion of audio-visual features,thus effectively improving the performance of scene classification.The main work of this paper is as follows:(1)In view of the single and one-sided limitations of unimodal information,this paper uses attention mechanism to perform multimodal fusion of audio-visual features.Therefore,a scene classification system based on attention mechanism for dualmodal mutually assisted decision is proposed,and the experimental results show that the performance of scene classification can be effectively improved after the mutual learning of information between the modalities.(2)On the basis of previous work,an early fusion is added for audio-visual features,and then the self-attention mechanism is used for deep fusion,so as to study the change of scene classification performance after dual fusion learning.Experimental results show that the model has better performance in capturing multimodal features,and has achieved good results in Dcase Challenge 2021 Task 1B competition.Experimental results are evaluated by classification accuracy.The multimodal scene classification system based on self-attention mechanism proposed in this paper achieved the accuracy of 90.26% in TAU Urban Audio Visual Scenes 2021 dataset,which achieved a significant improvement in scene classification performance compared with the baseline system.
Keywords/Search Tags:Multimodal Fusion, Attention, Audio-Visual Scene Classification, Deep Learning, Auxiliary Learning
PDF Full Text Request
Related items