Research On Polyphonic Sound Event Detection With Deep Neural Network

Posted on:2020-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Liu

Full Text:PDF

GTID:2428330572487270

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Sound has attracted interests of researchers all the time as an important source of in-formation for human to perceive the surroundings and to communicate with each other.Polyphonic sound event detection(PSED)aims to analyze sound to figure out what are included in it automatically,like "speech" or "footsteps",or "speech" occurs while''footsteps"is going on.PSED has promising future in security monitoring,anomaly detection,situation awareness,biology monitoring and content retrieval.Traditional PSED systems mainly use non-negative matrix factorization(NMF),Hidden Markov Model and Gaussian Mixture Model(HMM-GMM).In recent years,with the rapid development of deep learning techniques,models based on deep neural network have brought breakthrough to performance of PSED.Networks such as Deep Neural Net-works(DNN),Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have succeeded applied on PSED.However,these existing deep learning tech-niques are still insufficient for two important and difficult problems in PSED:overlaping of events and lacking sufficient dataset.Therefore,the overall performance of PSED is still poor,which brings great difficulties to its application.This dissertation focuses on the two difficult problems mentioned above and de-velops the research on PSED with deep neural network.Firstly,from the perspective of features,a baseline system is built based on CNN-RNN model.CNN are used to extract event spectral structure information from input features,and RNN to model the temporal dependency.Experiments show that the approach can achieve better perfor-mance than tranditional approaches.Secondly,from the point of event overlapping,a PSED approach called CapsNet-RNN is proposed.In this approach,we model events using neures named capsules multi-perspectively and enable the network predict events from local features by routing algorithm.Meanwhile,RNN is further applied to learn context information.Experiments show that the model has the ability to select feature bands and channels when identifying different events,improving the detection perfor-mance espacially on overlapping situation.In addition,from the point of lacking labeled dataset,a semi-supervised learning method called self-training is applied to PSED.Ex-periments show that this method can significantly increase trainable data and improves detection performance.Finally,two sound-related databases based on transformer are constructed.Also,the validity of CNN-RNN and Capsnet-RNN methods is demon-strated in the transformer scenario.

Keywords/Search Tags:

Sound Event Detection, Polyphonic Sound Event Detection, Deep Neural Networks, Capsule Networks, Semi-supervised Learning

PDF Full Text Request

Related items

1	The Research Of Sound Event Classification And Detection On Semi-supervised Learning Method
2	Study On Polyphonic Sound Event Detection Based On Deep Learning
3	Research On Sound Event Detection Technology In Domestic Environment
4	Research On Sound Event Classification And Detection Method Based On Semi-supervised Learning
5	Research On Sound Event Detection Based On Deep Learning
6	Research On Sound Event Recognition Based On Deep Learning
7	Weakly Supervised Sound Event Recognition On Noisy Label Dataset
8	Neighbourhood Similarity Augmentation On Multi-source Sound Event Detection And Localization
9	Weakly Supervised Learning For Audio Analysis
10	Sound Detection,classification And Localization Under Noise Conditions