Font Size: a A A

Research On Sound Event Detection Based On Deep Learning

Posted on:2022-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiuFull Text:PDF
GTID:2518306341983209Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The main purpose of sound event detection is to detect the corresponding class name,the start and end time of the sound event in the audio.In some special environments,such as darkness,light effects,blind areas of vision and etc.,sound event detection can make up for the shortcomings of visual detection methods,greatly improves the detection reliability and gives play to its unique advantages.As a hot issue in the field of computer hearing,sound event detection has been widely used in smart homes,security monitoring,smart medical care,biodiversity monitoring and other fields,and brings people a more convenient,safer and more comfortable life.Sound event detection has very broad application prospects and important research value.In the research of sound event detection,there are mainly the following two difficulties:polyphonic sounds will occur at the same time,which makes polyphonic sound event detection difficult;when the sound data lacks the start and end time tags,improving the weak tag sound event detection performance becomes another problem.Aiming at the above two difficulties,this thesis uses deep neural networks to conduct research on sound event detection.The main contents are as follows:(1)A capsule network with gate-structure dilated convolutions and residual connection is proposed to improve the performance of polyphonic sound event detection.First,in order to alleviate the problem of insufficient feature extraction caused by the single convolution layer in the capsule network,a gate-structure dilated convolutions structure is proposed for deep feature extraction.In the gate-structure dilated convolutions structure,the gated linear unit has the ability to reduce the interference of irrelevant features,the dilated convolution layer can obtain time-frequency information with longterm context,and the residual connection can effectively alleviate the problem of gradient disappearance;secondly,the prime capsule layer obtains the capsule features in vector form,and the detection capsule layer as the classification structure.The dynamic routing algorithm learns the relationship of the part and whole relationship between the two layers of capsules to better identify the overlapping parts of the sound;finally,a new mixup method for time-frequency spectrogram is proposed.This method improves the generalization ability of the model.In order to verify the effectiveness of the proposed model,relevant experiments were conducted on the TUT Sound Events 2017 data set.The results on the evaluation set show that,compared with the capsule network,the proposed network model has reduced ER by 19%and increased F1 by 3.9%;compared with other classic deep neural network models,it also has obtained lower ER and higher F1,improves the detection performance of polyphonic events detection effectively.(2)A convolutional recurrent network model based on multi-scale feature fusion and attention mechanism is proposed to improve the performance of weak label sound event detection.First,a multi-scale attention module is proposed,which combines local attention based on time-frequency attention and global attention based on channel attention.Multi-scale attention module realizes the attention to the time-frequency feature unit and different weighted channel features.Secondly,a multi-scale feature fusion method is proposed,which realizes the fusion of the features of different convolutional layers,and obtains multi-scale feature maps with different dimensions.Then,the two bidirectional gated recurrent units model the time dependence and the full connection layer as a classifier.Finally,the data balancing technology is used to expand the number of a small number of samples to further generalize the model.In order to verify the effectiveness of the proposed model,relevant experiments were carried out on a subset of AudioSet.The results on the evaluation set show that,compared with the convolutional recurrent network,the proposed network model has reduced ER by 11%and increased F1 by 8.3%.Compared with other methods on the same data set,it is competitive and effectively improves detection performance of weak label sound events.
Keywords/Search Tags:sound event detection, capsule network, convolutional recurrent network, attention mechanism, multi-scale feature fusion
PDF Full Text Request
Related items