Font Size: a A A

Research On Sound Event Detection Technology In Domestic Environment

Posted on:2022-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:D C YuFull Text:PDF
GTID:2518306788456114Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The purpose of sound event detection technology is to analyze the collected audio,so as to detect the categories and time boundaries of sound events.With the increasing demand for elderly care in our society,sound event detection technology will have broad application prospects in domestic environment.At present,there is a shortage of strong label training data in the task of sound event detection,which limits the performance of neural network model.To solve the above problem,this paper conducts the following research:Firstly,this paper constructs a convolution recurrent neural network(CRNN),which uses the log Mel spectrum of audio as the input feature,and can effectively extract the high-level features of audio by using the long-term context information.Based on CRNN,this paper proposes two weak supervised sound event detection methods trained with weakly labeled data,so as to reduce the dependence of the model on strongly labeled data.The two methods are based on multi instance learning hypothesis and class activation mapping technology respectively: the former uses the multi instance pooling module to aggregate the frame-level predictions generated by CRNN into clip-level predictions,and focuses on the foreground frame in the training process,so as to learn from the weakly labeled data;The latter uses class activation mapping technology to directly learn pseudo strong labels from weakly labeled data,and uses erasure technology to optimize the accuracy of pseudo strong labels,so as to train CRNN model.Then,this paper further designs the overall model of sound event detection based on semi supervision.Through the mean teacher structure,this paper use weakly labeled,unlabeled and synthetic audio data to train the model at the same time.In the training process,data augmentation technology is used to improve the generalization ability of the model.These measures enhance the performance of the semi supervised models as much as possible without using strongly labeled data.Based on the semi supervised model with the best performance,this paper develops a visual algorithm demonstration platform based on Python and Py Qt library,which can more intuitively show the effect of sound event detection algorithm,and build a preliminary framework for the practical application of the algorithm in the future.In order to verify the performance of the model,experiments are carried out on DCASE2021 Task 4 dataset.The results show that the overall performance of both proposed semi supervised models are improved compared with the baseline model of the dataset.Among them,"MIL-WSU-M" model based on multiple instance learning achieves 0.36 of PSDS1 and 0.56 of PSDS2,which increase by 5.88% and 7.69%respectively compared with baseline;The "CAM-WSU-M" model based on class activation mapping achieves 0.31 of PSDS1 and 0.57 of PSDS2,which are 8.82%lower and 9.61% higher than baseline respectively.
Keywords/Search Tags:Sound Event Detection, Neural Network, Weakly Supervised Learning, Multiple Instance Learning, Class Activation Mapping, Semi Supervised Learning
PDF Full Text Request
Related items