Font Size: a A A

Research On Sound Event Detection And Location Based On Improved CRNN Model

Posted on:2022-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2518306614959909Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The purpose of Sound Event Detection and Localization(SEDL)is to identify the labels of all sounds in an audio clip,their respective starting offset times,and the arrival directions of their azimuth and elevation angles.This technology can effectively automatically describe human activities through spatial dimensions and help machines interact more seamlessly with the world.SEDL can be an important module of auxiliary listening system,scene information visualization system and immersive interactive media.In the process of audio event detection and location,different sounds at the same time overlap with each other,and the background noise and the frequency band spectrum of the event to be measured are similar and cannot be separated,which makes SEDL more difficult.In view of the above problems,on the basis of learning and analyzing deep learning knowledge,this thesis improves Convolutional Recurrent Neural Network(CRNN)and constructs a Neural Network model with high recognition and location performance.In this thesis,a multi-scale convolution feature fusion and recurrent neural network are proposed for SEDL.In this method,multi-scale feature fusion uses the feature hierarchy structure of convolutional neural network to fuse features of different scales,enrich the relationship between global features and local features,and improve the problem of insufficient extraction of time-frequency feature map information in single-size convolution check,thus improving the accuracy of detection and location.Experimental results show that the proposed method has good detection performance.Compared with the single scale CRNN model,the error rate is reduced by 0.24,the F1-Score is increased by 20.6%,the positioning error is reduced by 8.7 degrees,and the frame recall rate is increased by 11.6%.Aiming at the low accuracy of overlapping audio detection and location,this thesis proposes a method based on residual network,spatial channel squeeze ?excitation attention mechanism and cyclic neural network.In this method,the residual network integrates spatial channel squeeze ? excitation attention module to replace the common convolution module.The network degradation caused by network depth is solved by adding residual structure and feature extraction is strengthened.The spatial and channel squeeze ? excitation attention mechanism strengthens the channel and space relationship of the cyclic convolutional neural network,and makes the feature extraction of the network more directivity,thus improving the recognition and localization of overlapping audio.The detection and location experiment results under DCASE2019 data set show that the error rate of Res-sc SE-CRNN model increases by 0.1,but the F1-Score increases by 3.4%compared with M-CRNN model,and the error rate decreases by 0.15 and the F1-score increases by 25.4% compared with CRNN baseline model.Compared with the M-CRNN model,the positioning error is reduced by 22.8 degrees,and the frame recall rate is increased by 3.0%.From the analysis of the overall performance of the model,it can be found that the Res-sc SE-CRNN model has a significant improvement in positioning compared with the M-CRNN model.In conclusion,the multi-scale convolution feature fusion model can effectively enhance time-frequency features and complement other features of the same sound event through convolution of different scales,thus improving the detection performance of sound events.The residual fusion attentional mechanism model can enhance the feature extraction of key channels and improve the localization performance of sound events by superplacing the convolution module of attention mechanism according to the residual structure.SEDL will become an important application in the field of intellisense in the future.
Keywords/Search Tags:sound event detection and location, convolutional recurrent neural network, multi-scale feature fusion, Attention mechanisms
PDF Full Text Request
Related items