Font Size: a A A

Research On Polyphonic Sound Event Detection With Deep Neural Network

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiuFull Text:PDF
GTID:2428330572487270Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sound has attracted interests of researchers all the time as an important source of in-formation for human to perceive the surroundings and to communicate with each other.Polyphonic sound event detection(PSED)aims to analyze sound to figure out what are included in it automatically,like "speech" or "footsteps",or "speech" occurs while''footsteps"is going on.PSED has promising future in security monitoring,anomaly detection,situation awareness,biology monitoring and content retrieval.Traditional PSED systems mainly use non-negative matrix factorization(NMF),Hidden Markov Model and Gaussian Mixture Model(HMM-GMM).In recent years,with the rapid development of deep learning techniques,models based on deep neural network have brought breakthrough to performance of PSED.Networks such as Deep Neural Net-works(DNN),Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have succeeded applied on PSED.However,these existing deep learning tech-niques are still insufficient for two important and difficult problems in PSED:overlaping of events and lacking sufficient dataset.Therefore,the overall performance of PSED is still poor,which brings great difficulties to its application.This dissertation focuses on the two difficult problems mentioned above and de-velops the research on PSED with deep neural network.Firstly,from the perspective of features,a baseline system is built based on CNN-RNN model.CNN are used to extract event spectral structure information from input features,and RNN to model the temporal dependency.Experiments show that the approach can achieve better perfor-mance than tranditional approaches.Secondly,from the point of event overlapping,a PSED approach called CapsNet-RNN is proposed.In this approach,we model events using neures named capsules multi-perspectively and enable the network predict events from local features by routing algorithm.Meanwhile,RNN is further applied to learn context information.Experiments show that the model has the ability to select feature bands and channels when identifying different events,improving the detection perfor-mance espacially on overlapping situation.In addition,from the point of lacking labeled dataset,a semi-supervised learning method called self-training is applied to PSED.Ex-periments show that this method can significantly increase trainable data and improves detection performance.Finally,two sound-related databases based on transformer are constructed.Also,the validity of CNN-RNN and Capsnet-RNN methods is demon-strated in the transformer scenario.
Keywords/Search Tags:Sound Event Detection, Polyphonic Sound Event Detection, Deep Neural Networks, Capsule Networks, Semi-supervised Learning
PDF Full Text Request
Related items