Research On Sound Event Detection And Localization Method Based On Multi-Scale Convolutional Neural Network

Posted on:2024-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Zhou

Full Text:PDF

GTID:2568307091965249

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Sound Event Localization and Detection（SELD）refers to the identification of sound events associated with tags from audio,detecting the onset and offset times of active events,and estimating their spatial location.This technique not only effectively describes human activity in the spatial dimension,but also helps machines to interact better with the environment.SELD can be an important module for assistive listening systems,scene information visualization systems,and immersive interactive media.In recent years,deep learning algorithms have developed rapidly,and various deep network frameworks have emerged.This thesis focuses on the joint task of sound event detection and localization by network models of multi-scale convolutional neural network.In the process of sound event detection and localization,there are problems that multiple sound events overlap in time and the target sound and background noise are not easily distinguished from each other.To solve the above problems,two methods of sound event detection and localization based on multi-scale convolutional modules are proposed in this thesis,which can further improve the detection and localization performance of the model.The main research works are as follows:（1）To further improve the feature extraction capability of the model,a multi-scale convolutional module is proposed to replace the convolutional layer in CRNN.The computational efficiency is improved by parallel operations of different size convolutional kernels.The optimal structure of the multi-scale convolutional branch is explored.The detection and localization performance of the model for different numbers of overlapping sound events is also analyzed,and compared with other methods on the MANSIM dataset.Compared with the SELDnet method which uses the CRNN model,the multi-scale CRNN model has improved the F₁ score by 10%,reduced the ER by 0.16,reduced the DOA error by13^°,and increased the DOA recall by 5%.The results of the comparison experiments show that the proposed multiscale convolution module in this paper improves the performance of sound event detection and localization.（2）The network model based on CRNN cannot make good use of the association between two tasks and performs poorly in the presence of ambient noise and reverberant interference.To solve these problems,a joint SELD network with soft parameter sharing based on a dual branch attention module is proposed.Soft parameter sharing can effectively exploit the association between two subtasks.In addition,the attention mechanism is introduced to improve the recognition of sound events in complex environment context.The dual branch attention module（DBAM）with a simpler network structure compared to Conformer is proposed.The global and local contextual information is modeled using two parallel branches of attention and convolution.The proposed model is evaluated on the TAU-NIGENS spatial sound event 2020 dataset.The computational complexity and number of parameters of the dual branch attention module are compared with other attention mechanisms.The impact of soft parameter sharing on the federated network is also analyzed.The network model is used to detect and localize sound events in noisy and reverberant environments with joint evaluation metrics in DCASE2020.Compared to the multi-scale CRNN model,the proposed method has a 0.056 reduction in ER_20°,a 4.6%improvement in F_20°fraction,a 3^°reduction in LE_CD,and a 3.2%improvement in LR_CD,demonstrating the improved detection and localization performance of the model proposed in this thesis.

Keywords/Search Tags:

sound event detection and localization, multi-scale convolutional neural networks, multi-task learning, attentional mechanisms

PDF Full Text Request

Related items

1	Research On Sound Event Detection And Location Based On Improved CRNN Model
2	Research On Multi-sound Event Localization And Detection Method Based On Deep Learning
3	Neighbourhood Similarity Augmentation On Multi-source Sound Event Detection And Localization
4	Research On Sound Event Detection Based On Deep Learning
5	Research On Deep Learning For Sound Event Detection
6	Semi-supervised Sound Event Detection Based On Deep Neural Network
7	Object Detection Algorithm Research Based On Multi-feature Multi-scale Convolutional Neural Networks
8	Research On Polyphonic Sound Event Detection Algorithm Based On Multi-layer Neural Network
9	Research On Dense Object Detection Algorithm Based On Convolutional Neural Networks
10	Face Detection Algorithm With Multi-scale And Multi-task Based On Region Convolution