Font Size: a A A

Research On Environment-assisted Polyphonic Acoustic Event Detection

Posted on:2020-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:L J GaoFull Text:PDF
GTID:2428330596497071Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the first carrier of information transmission,sound signals contain a lot of information.How to automatically capture the effective information for polyphonic Acoustic Event Detection(AED)by computer has gradually become the mainstream of research.The purpose of AED is to identify the sound events occurring in the same continuous acoustic signal and to mark the beginning and end times.Acoustic event detection can be used in a variety of application scenarios,such as acoustic monitoring,and the study of AED has positive and important social significance.At present,the existing methods in the AED field can only learn overall features to identify the different sounds,and when the number of sound is large and the case is complicated,the overall features cannot express each category of event well,which makes the detection performance low,especially in a large number of target events case.However,there are few researches on learning a discriminative feature for each type of event.In addition,the existing methods do not take the environmental context information into account,which contains a lot of useful information to guide the detection of sound events.If this information is ignored,the environmental robust sound event detection cannot be achieved,that is,the same event occurring in different environmental backgrounds cannot be well detection.In view of the above two challenges,a novel feature learning method based on disentangled representation learning and an environment-assisted sound event detection method are proposed respectively.The follows are the main contents and contributions of this paper:(1)We propose an event-specific feature learning method based on disentangled representation.The main idea of this method is to introduce a novel disentangling constraints to the ?-variational auto-encoder(?-VAE),which can extract the generative factors of the sound signals,and disentangle the event-specific latent factors form the generative factors to learn a feature representation for each event via tow manner: feature blocks and attention mechanism.The experimental results show that the performance of the event-specific feature learning method based on disentangled representation significantly improves the performance of sound event detection,even in the case of detecting a large number of acoustic events: On the DCASE 2017 challenge,our method is outperforms than the top performers(top-1 F1 J-NEAT and top-1 ER CRNN),and on the Freesound dataset the AED performance of our method is significant even in the case of large variety of events case(for example,when there are 20 types of events in the dataset,the F1 of the proposed method is 85.09%,which is far higher than the baseline(41.39% of DNN)and the mainstream method(71.30% of CRNN)).(2)We propose an Environment-assisted multitask polyphonic acoustic event detection model(EAMT model).The main ideal of EAMT model is to learn a environment discriminative context features in a multitask learning way,where the context features capture the knowledge of the background environment where sound events occur and the hidden information related to sound events in the environment.Taking the environment discriminative context features as additional information to assist acoustic event detection can improve the robustness of the model to environmental changes and the performance of AED.The experimental results show that the environment-assisted multitask model can further improve the performance of sound event detection on benchmark datasets.What's more,the EAMT model is also proved that it is robust at environmental changes,that is,the same kind of event occurring in different scenarios can be detected well: On the Freesound dataset,we evaluate the AED performance in different environment,and the F1 of the proposed method is 87.01% while the baseline DNN and mainstream model CRNN is 82.24% and 85.76% respectively.(3)We design and implement the prototype system of environment-assisted acoustic event detection.The Python programming language with the Keras package are used to design and implement such system,which consists of four modules: data preprocessing,event-related feature extraction,and environment-assisted multitask acoustic event detection model.In this system,we implement both the two-method mentioned before: event-specific feature learning method based on disentangled representation and the EAMT model for polyphonic sound event detection.
Keywords/Search Tags:Acoustic sound event detection, disentangled representation, environment-assisted, Multitask learning
PDF Full Text Request
Related items