Research On Audio Scene Classification Method Based On Attention Mechanism And Deep Supervision

Posted on:2022-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:W B Zhao

Full Text:PDF

GTID:2518306338485404

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Computer auditory scene analysis aims to solve the problem of how computers perceive the surrounding environment through sound like humans.Audio scene classification is one of its sub-problems,and its main purpose is to design a system that can correctly determine the type of scene in which a new audio signal is located.With the rapid development of artificial intelligence technology,audio scene classification technology has begun to be applied in many industries such as intelligent surveillance,autonomous driving,and wearable smart devices,and has become one of the current academic research hotspots.Aiming at the problem of ignoring multi-domain features and network hidden layer information in the existing solutions to audio scene classification problems,this paper proposes a multi-attention fusion module and a multi-layer feature fusion network based on deep supervision,thereby improving the problem of insufficient extraction of general feature information by the network model,and the accuracy of classification is improved.The main work and optimization points of this paper are as follows:(1)A multi-attention fusion module is proposed.Audio data has problems such as variable starting point of the effective data interval and mixed multi-scene audio.It is difficult to obtain high-performance general features by simply using feedforward neural networks for feature extraction.Therefore,this paper introduces the attention mechanism to realize the adaptive learning and feature extraction of the network.In addition,in response to the problem that previous researchers only focused on single-domain feature extraction,this paper proposes a multi-attention fusion module that integrates temporal and spatial features.The experimental results show that the performance of the module is better than the commonly used deep learning algorithm in the three evaluation indicators,with an AUC of 96.8%.(2)A multi-layer feature fusion network based on deep supervision is proposed.Because the rich local feature information in the shallow network and the sufficient global description in the deep network have a greater impact on the classification effect of the network.Therefore,this paper combines the shallow and deep features,uses the deep supervision method to learn the local and global feature representations of audio information at the same time,and effectively merges the features.The network has conducted sufficient experiments on the Audio Set,and the results show that its system performance is higher than the Google benchmark model,with an AUC of 97%.

Keywords/Search Tags:

Audio scene classification, deep learning, attention mechanism, deep supervision

PDF Full Text Request

Related items

1	Research On Attention Based Image Classification With Deep Learning
2	Study Of Attention-based Deep Models For Acoustic Scene Classification
3	Multimodal Scene Classification Algorithm Based On Self-attention
4	Research On Question Classification Based On Weak Supervision And Deep Learning
5	Research On Classification Of Acoustic Scenes Based On Deep Learning
6	Visual Attention Models Based On Deep Learning For Scene Classification
7	Research On Audio Event Classification Based On Deep Learning
8	Research On Methods Of Image Scene Classification And Small Target Segmentation Based On Deep Learning
9	Scene Classification Based On The Deep Learning
10	Research On Text Classification Model Based On Deep Learning And Attention Mechanism