Font Size: a A A

A Study Of Robust Acoustic Modeling Methods For Biological Sound Events

Posted on:2022-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:T T TangFull Text:PDF
GTID:2518306746968709Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sound events detection(SED)refers to detect whether target event occur in a given audio,and further,detect the onset and offset of the target event.SED technology brings a lot of convenience to daily life and work,and its application in environmental protection,medical treatment,security,water conservancy,transportation and other fields greatly reduces the human cost.With the increasing demand,more and more researchers pay attention to SED.Biology is an important part of ecosystem.Therefore,bioacoustic event detection(BED)has attracted more and more attention.This paper focusses on the BED robustness.In recent years,deep learning for SED have been greatly developed,however,it's data-driven nature creates poor robustness.It is mainly reflected in two aspects: domain mismatch and the lack of data.Source domain refers to training dataset,and target domain refers to test dataset.The coverage of source domain is limited,and the target domain scenarios are changeable.In this reality,the model has poor generalization.The field of animal activity is diverse,so the domain mismatch exists widely in BED.For example,bird audio detection,when bird calls are used to monitor the environmental development,the recordings that can be collected often come from another place,and there are differences between source and target domain in background noise,bird species,recording equipment and other aspects.On the other hand,it is difficult to obtain recordings or there is high-cost annotation.Detecting rare animal sounds is of great significance for wildlife conservation,but it is usually difficult to record for a long time.Therefore,the model should be trained by few samples.This paper investigates the above issues in bird audio detection and few-shot BED respectively.To solve the domain mismatch in bird audio detection,this paper proposes solutions in several aspects.In terms of data,a time-domain cross-condition data augmentation(TCDA)method is proposed to enrich the source domain and narrow the domain differences.For features,robust percussive features(RPFs)are proposed to improve the identifiability of bird call representations.For model training,this paper proposes a CNN-based discriminative training method,with refined training objectives,improves the distinction between bird calls and background environment,and therefore weakens the domain differences.In few-shot BED,this paper proposes a two-stage training method to enhance the system generalization.First,a prototype network training framework based on residual network(Res Net)is built,and AudioSet is used for pre-training.Next,in the fine-tuning stage,the inference-time data augmentation and embedding propagation(EP)is used to reduce the over-fitting.And in the detection stage,the average method is used to improve the stability.The methods proposed above are validated on DCASE 2018 Task 3 and DCASE2021 Task 5 dataset,respectively.The experimental results of DCASE 2018 Task 3show that TCDA method obtained 5.02% absolute AUC improvement,and RPFs obtained 3.3% absolute AUC improvement.After the normalization of RPFs by PCEN,a further improvement of 4.22% absolute AUC is obtained.The CNN-based discriminative training method achieved 1.8-12.1% AUC improvement on crossdomain datasets.On the DCASE 2021 Task 5 dataset,the proposed robust acoustic modeling method for few-shot BED achieved 9.28% absolute F-measure improvement.
Keywords/Search Tags:Bird audio detection, Few-shot, Domain mismatch, Prototypical network
PDF Full Text Request
Related items