Sound Event Detection research is to detect occurred sound events and corresponding timestamps.Since ambient sounds tend to overlap with each other,as well as the complex noise environment which makes detection even more tricky.Detection is often treated as multilabel classification problem,it is overlapping when multiple categories of sounds occurred in one frame.The time stamps are then obtained as a certain class of sound event been continuously classify in multi-frame.Consequently,the sound event detection problem is a continuous multi-label classification problem.As with classification problems,neural networks have the ability to extract patterns from audio data,which becomes the mainstream method in academia.Among various neural network models,attention mechanism has been focused and successfully used in Natural Language Processing and Sound Event Detection.It enables a better decision by weighted-sum audio frames.This study is based on the attention mechanism,and explore to what extend that the attention mechanism can help to improve Sound Event Detection.Specifically,the contributions are:1.Since ambient sound are lack of inherent grammatical and semantic structures,it is memory wasted to computed attention by including more frames with long time interval.Therefore,in view of the lack of memory-controlled mechanism in traditional attention-based Sound event detection,this study proposes to used memory-controlled model.Evaluation is performed in datasets of two scenario which prove the effectiveness of this method.2.The selection of attention span of different datasets is heuristic.This study proposes an adaptive mechanism that can learn its optimal memory span.Experimental results show that this mechanism achieves the similar level result of artificial optimization.3.During training a large amounts of audio date is used including synthetic strongly labeled,real-life weakly labeled and unlabeled recording.Using Multiple Instance Learning to leverage weakly and strongly labeled data,this study evaluates a set of pooling methods in two scenarios.According to the experiment,the advantages of attention pooling are not brought into full play in DCASE Challenge 2021 task 4,so the feature level attention pooling using a larger embedding space was proposed.The experiment claims that even a small embedding space can improve the detection in all metrics. |