Font Size: a A A

Research On Audio Scene Classification Method Based On Attention Mechanism And Deep Supervision

Posted on:2022-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:W B ZhaoFull Text:PDF
GTID:2518306338485404Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Computer auditory scene analysis aims to solve the problem of how computers perceive the surrounding environment through sound like humans.Audio scene classification is one of its sub-problems,and its main purpose is to design a system that can correctly determine the type of scene in which a new audio signal is located.With the rapid development of artificial intelligence technology,audio scene classification technology has begun to be applied in many industries such as intelligent surveillance,autonomous driving,and wearable smart devices,and has become one of the current academic research hotspots.Aiming at the problem of ignoring multi-domain features and network hidden layer information in the existing solutions to audio scene classification problems,this paper proposes a multi-attention fusion module and a multi-layer feature fusion network based on deep supervision,thereby improving the problem of insufficient extraction of general feature information by the network model,and the accuracy of classification is improved.The main work and optimization points of this paper are as follows:(1)A multi-attention fusion module is proposed.Audio data has problems such as variable starting point of the effective data interval and mixed multi-scene audio.It is difficult to obtain high-performance general features by simply using feedforward neural networks for feature extraction.Therefore,this paper introduces the attention mechanism to realize the adaptive learning and feature extraction of the network.In addition,in response to the problem that previous researchers only focused on single-domain feature extraction,this paper proposes a multi-attention fusion module that integrates temporal and spatial features.The experimental results show that the performance of the module is better than the commonly used deep learning algorithm in the three evaluation indicators,with an AUC of 96.8%.(2)A multi-layer feature fusion network based on deep supervision is proposed.Because the rich local feature information in the shallow network and the sufficient global description in the deep network have a greater impact on the classification effect of the network.Therefore,this paper combines the shallow and deep features,uses the deep supervision method to learn the local and global feature representations of audio information at the same time,and effectively merges the features.The network has conducted sufficient experiments on the Audio Set,and the results show that its system performance is higher than the Google benchmark model,with an AUC of 97%.
Keywords/Search Tags:Audio scene classification, deep learning, attention mechanism, deep supervision
PDF Full Text Request
Related items