Font Size: a A A

The Research Of Sound Event Classification And Detection On Semi-supervised Learning Method

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J YanFull Text:PDF
GTID:2428330602998974Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sound event detection is a technology to detect the sound events and the location of a piece of audio.Sound event classification only needs to obtain event category information.This technology is an indispensable means of obtaining information about the surrounding environment,especially in certain specific environments,such as dark environments,which has unique advantages.In recent years,with the release of large-scale audio event data set AudioSet by Google,it is possible to use neural networks for sound event detection.More and more researchers have begun to pay attention to this direction,and the development of sound event detection has ushered in a new situation.However,it is undeniable that sound event detection still faces many difficulties.On the one hand,the sound events are complex and changeable,and even multiple events may occur at the same time,which undoubtedly puts high requirements on the detection system.On the other hand,the data set with perfect labels is expensive and difficult to obtain.AudioSet is a weakly labeled data set containing only event categories.It is also very challenging to use such data or even unlabeled data to obtain a usable detection system.At present,the technology of sound event detection is still in the initial stage of development,and there are still many problems to be solved from a truly practical and mature system.This paper focuses on the above two difficulties.First,a sound event classification and detection system based on Convolutional Recurrent Neural Network(CRNN)is built,which uses weakly label data for training.On this basis,an attention mechanism is introduced to extract more distinguishing features,strengthen effective features,and suppress useless features.It not only extracts the effective structure information from the local point of view,but also selects the useful channel information from the global point of view.Experiments show that the attention mechanism effectively improves the performance of the system.Then we propose sound event prototypes that are richer than frame-level information to expand the field of view,and further propose multi-scale sound event prototypes.The events with longer duration use large-scale sound event prototypes,and the events with shorter duration use small-scale sound event prototypes.As a result,the system can extract feature representations that are more consistent with the characteristics of sound events.Experiments show that multi-scale sound event prototypes have a positive effect on event recognition.Finally,the mean teacher semi-supervised learning method uses unlabeled data.We design a multi-task model,so that the detection task and the classification task use different branches to alleviate the contradiction on the feature requirements.And then the teacher model can produce more reliable learning targets to guide the student model.In addition,a data mixing technique that mixes both labeled and unlabeled data is proposed to expand the data range.At the same time,it is used as data perturbation for better semi-supervised learning.Experiments show that the system performance can be improved by using unlabeled data.
Keywords/Search Tags:Sound Event Detection, Sound Event Classification, Weakly Labeled, Attention, Semi-supervised Learning, Artificial Neural Network
PDF Full Text Request
Related items