Font Size: a A A

Research On Sound Event Classification And Detection Method Based On Semi-supervised Learning

Posted on:2022-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiangFull Text:PDF
GTID:2518306746968679Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sound event detection(SED)is a key technology in the application fields of audio environment monitoring,smart home and intelligent assisted driving.In recent years,it is one of the research hotspots of intelligent sound signal processing.With the development of deep learning technology and the advent of the era of big data,the modeling of acoustic event detection system using deep neural network has become the focus of many researchers.It is urgent and important to establish a lightweight and intelligent sound event detection system.In recent years,although great progress has been made in sound event detection,there are still difficulties and challenges as in the following aspects:(1)How to use a small amount of weak label data and a large amount of unlabeled data for model learning when there is a heavy lack of strong label data with time stamp;(2)In most cases,there are event overlap and noise interference in the audio data that collected from complex and changeable environment.How to establish a detection system with high accuracy and robustness is a challenge;(3)With the diversity of application scenarios,more and more requirements are related to the model complexity.How to establish a lightweight sound event detection system is also one of the important problems.This paper mainly focuses on the above three difficulties,four main innovations are contributed in this study as the following:(1)To solve the strong labeled data sparse problem,we built a teacher-student semi-supervised learning framework based on convolution recurrent neural network model,making full use of strong labeled data,weak labeled data and a large number of unlabeled data to train models effectively;(2)To reduce the model complexity,we built a complex teacher model,to guide the lightweight student model training,and we performed the inference only using the lightweight student model;(3)To improve the performance of the detection system in complex scenes and the efficiency of model training,we specially proposed a deep feature distillation,adaptive focal learning,multi-stage model training strategy and post-processing techniques;(4)To alleviate the influence of event overlap and background noise interference on system modeling,we proposed to use sound separation technology to assist the modeling of sound event detection system,by using separated data and mixed data to jointly training the model.Finally,by proposing the multi-model score fusion strategy based on event discrimination,we exploit the complementary information between different models to further improve the overall performance of the system.In this paper,we use DCASE 2019 Task 4 and DCASE 2021 Task 4 datasets to perform experiments for verifying the effectiveness of the technology.The results show that deep feature distillation,adaptive focal loss learning,post-processing and other technologies have significantly improved the system performance.On DCASE 2019 Task 4 dataset,the sound event detection system based on teacher-student model structure achieves 51.3%,76.7% and 83.1% in Event-based F1-score,Segment-based F1-score and AT F1-score respectively.Compared with the first place in DCASE 2019 Task 4 evaluation,they are improved by 8.0%,19.6% and 9.0% respectively;In addition,the proposed method of using sound separation technology to assist the modeling of sound event detection system also brings great performance benefits.Compared with the baseline system,the performance of Event-based F1-socre,PSDS1 and PSDS2 are improved by 4.3%,8.6% and 20.9% respectively on DCASE 2021 Task4 dataset.In addition,the score fusion method based on class discrimination also significantly improves the performance of sound event detection system.In conclusion,this paper first proposes a series of techniques to improve the sound event detection system based on teacher-student model structure.Then,a sound event detection system based on sound separation technology is further proposed to improve the SED performances.Extensive experiments are performed to analyze the effectiveness of all the proposed methods.At the same time,the proposed technologies are also compared with other related state-of-the-art technologies in the literature.At the end of this paper,all the related works are summarized,and followed by the future research direction.
Keywords/Search Tags:Sound event classification and detection, Semi-supervised learning, Feature distillation, Speech separation, Score fusion
PDF Full Text Request
Related items