Font Size: a A A

Research On Sound Event Detection And Localization Method Based On Multi-Scale Convolutional Neural Network

Posted on:2024-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhouFull Text:PDF
GTID:2568307091965249Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sound Event Localization and Detection(SELD)refers to the identification of sound events associated with tags from audio,detecting the onset and offset times of active events,and estimating their spatial location.This technique not only effectively describes human activity in the spatial dimension,but also helps machines to interact better with the environment.SELD can be an important module for assistive listening systems,scene information visualization systems,and immersive interactive media.In recent years,deep learning algorithms have developed rapidly,and various deep network frameworks have emerged.This thesis focuses on the joint task of sound event detection and localization by network models of multi-scale convolutional neural network.In the process of sound event detection and localization,there are problems that multiple sound events overlap in time and the target sound and background noise are not easily distinguished from each other.To solve the above problems,two methods of sound event detection and localization based on multi-scale convolutional modules are proposed in this thesis,which can further improve the detection and localization performance of the model.The main research works are as follows:(1)To further improve the feature extraction capability of the model,a multi-scale convolutional module is proposed to replace the convolutional layer in CRNN.The computational efficiency is improved by parallel operations of different size convolutional kernels.The optimal structure of the multi-scale convolutional branch is explored.The detection and localization performance of the model for different numbers of overlapping sound events is also analyzed,and compared with other methods on the MANSIM dataset.Compared with the SELDnet method which uses the CRNN model,the multi-scale CRNN model has improved the F1 score by 10%,reduced the ER by 0.16,reduced the DOA error by13°,and increased the DOA recall by 5%.The results of the comparison experiments show that the proposed multiscale convolution module in this paper improves the performance of sound event detection and localization.(2)The network model based on CRNN cannot make good use of the association between two tasks and performs poorly in the presence of ambient noise and reverberant interference.To solve these problems,a joint SELD network with soft parameter sharing based on a dual branch attention module is proposed.Soft parameter sharing can effectively exploit the association between two subtasks.In addition,the attention mechanism is introduced to improve the recognition of sound events in complex environment context.The dual branch attention module(DBAM)with a simpler network structure compared to Conformer is proposed.The global and local contextual information is modeled using two parallel branches of attention and convolution.The proposed model is evaluated on the TAU-NIGENS spatial sound event 2020 dataset.The computational complexity and number of parameters of the dual branch attention module are compared with other attention mechanisms.The impact of soft parameter sharing on the federated network is also analyzed.The network model is used to detect and localize sound events in noisy and reverberant environments with joint evaluation metrics in DCASE2020.Compared to the multi-scale CRNN model,the proposed method has a 0.056 reduction in ER20°,a 4.6%improvement in F20°fraction,a 3°reduction in LECD,and a 3.2%improvement in LRCD,demonstrating the improved detection and localization performance of the model proposed in this thesis.
Keywords/Search Tags:sound event detection and localization, multi-scale convolutional neural networks, multi-task learning, attentional mechanisms
PDF Full Text Request
Related items