Font Size: a A A

Study On Polyphonic Sound Event Detection Based On Deep Learning

Posted on:2022-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2518306509993209Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Polyphonic sound event detection is a technology to classify audio events and mark their start and end times.It has broad application prospects in public safety,smart home,multimedia information retrieval and other fields.Since different sound events sometimes occur at the same time and overlap with each other,and the presence of background noise will also make the recognition of sound events more difficult.At present,in situations where the start and end time requirements for detecting sound events are relatively strict,supervised learning methods that rely on strong label datasets are often used.However,there are fewer strong label datasets,and high-performance deep network models often have a large number of parameters,which are difficult to apply to embedded systems.Therefore,building a neural network model with a small number of parameters and high recognition performance in the case of a limited dataset is a problem that needs to be solved in sound event detection.This thsis studies a polyphonic sound event detection system based on deep learning,and the main work is as follows:(1)A polyphonic sound event detection method based on residual network and Recurrent Neural Network is proposed.In this method,the residual network improves the recognition accuracy by increasing the network depth,solves the problem of network degradation to strengthen feature extraction;In order to increase the receptive field and improve the recognition performance,dilated convolution is used to replace the ordinary convolution in the residual network;Recurring Neural Network is used to capture long-term dependent information to fully extract context information.In this thsis,experiments are conducted on the evaluation dataset of TUT-sound-events-2017,and the experimental results show that this method has good recognition performance,and the error rate is reduced by6.3% compared with the multi-scale full convolutional network(MS-FCN)model.In addition,in this thsis,experiments are conducted on Freesound-noise series datasets.Compared with MS-FCN and Convolutional Recurrent Neural Network(CRNN)models,this method has a higher recognition performance under different signal-to-noise ratios and number of overlapping event categories.(2)A polyphonic sound event detection method based on Depthwise Separable Convolution,Squeeze-and-Excitation(SE)and Recurrent Neural Network is proposed.In this method,in addition to using Recurrent Neural Network to learn the long-term sound dependent information,Depthwise Separable Convolution is used to replace the ordinary convolution to reduce the number of model parameters and the amount of calculation.In addition,this thsis uses the SE attention mechanism to learn the importance of different channel features,and weights the channel features with the learned weight coefficients to improve the recognition performance of the model.The experimental results on TUT-sound-events-2017 datasets show that,compared with the MS-FCN model,the error rate of this model is reduced by 0.9% and the F1 score is increased by 0.4% on the development dataset;the error rate of this model is reduced by 6.4% and F1 score is increased by 0.9% on the evaluation dataset,and this model has only 110,000 parameters.This shows that this model has higher recognition performance even with fewer parameters.In addition,the experimental results on Freesound-noise series datasets show that the recognition performance of this method is higher than that of MS-FCN and CRNN under different signal-to-noise ratios and number of overlapping events categories.
Keywords/Search Tags:Polyphonic Sound Event Detection, Residual Network, Dilated Convolution, Depthwise Separable Convolution, Squeeze-and-Excitation
PDF Full Text Request
Related items