Font Size: a A A

Polyphonic Sound Event Detection Using Feature Space Attention And Temporal-frequency Attention

Posted on:2024-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y JinFull Text:PDF
GTID:2568307157481594Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Polyphonic sound event detection aims to predict sound event classes and the start and end times of various types of sound events,which is widely used in smart city transportation,smart home,telemedicine and other fields.At present,deep learning has become the main classifier algorithm for polyphonic sound event detection.However,the following problems still exist in the classifier task of practical scenarios: Firstly,the Convolutional Neural Network(CNN)extracts more higher-order feature dimensions,resulting in the obliteration of important feature dimensions;secondly,in the case of discontinuous and temporal-frequency variable sound events,a single acoustic feature cannot comprehensively characterize the key information of polyphonic sound events;finally,CNN is prone to information loss in the process of feature extraction,which leads to the model’s inability to accurately locate the start and end moments of sound events.In this paper,the following work is carried out to address the above problems.(1)To address the problem of important feature dimensions being annihilated,this paper uses the feature space attention mechanism to dynamically learn the weights of each dimension of the higher-order features,where the important feature dimensions are weighted more,and vice versa are smaller.The important feature space information is captured through the weighting of important feature dimensions by the feature space attention mechanism.The experimental results show that the polyphonic sound event detection model based on feature space attention outperforms other sound event detection models.(2)To address the problem that a single acoustic feature cannot comprehensively characterize the key information of polyphonic sound events,this paper improves the CNN by using a dual-input temporal-frequency attention network and experimentally investigates the impact of different feature fusions on performance improvement.The temporal-frequency attention module extracts the relevant time-frame information and key frequency band information of the features separately,which improves the features’ ability to characterize the time-frequency information of the polyphonic sound events.The final experimental results on the public datasets show that the classification performance of the temporal-frequency attention-based polyphonic sound event detection model is better,with F-scores 7.3% and 17.4% higher than the DCASE Task3 winning system model,and error rates(ER)40% and 33% lower,respectively.(3)To address the problem that the model cannot accurately locate the start and end times of sound events,this paper proposes a polyphonic sound event detection algorithm based on feature space attention and temporal-frequency attention(TFFS-CRNN).The temporal-frequency attention mechanism and feature-space attention mechanism are used to extract higher-order attention features containing relevant time-frame information,key frequency band information and important feature space information,and then improve the ability of features to characterize key information of polyphonic sound events.Finally,the Bidirectional Gate Recurrent Unit(BGRU)of the recurrent neural network is used to learn contextual information,which enables the model to predict the start and end moments of sound events more accurately.Experimental results on the public datasets of DCASE 2016Task3 and DCASE 2017 Task3 show that the F-score of the TFFS-CRNN model is 12.4%and 25.2% higher and the ER is 41% and 37% lower than the DCASE Task3 winning system model,respectively,The model the algorithm has compared to existing polyphonic acoustic event detection models with better classification performance and lower error rate.
Keywords/Search Tags:Polyphonic sound event detection, Neural network, Dual-input temporal-frequency attention,Feature space attention
PDF Full Text Request
Related items