Font Size: a A A

Research On Audio Event Detection Method Based On Deep Learning

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:B L TangFull Text:PDF
GTID:2428330623468110Subject:Navigation, guidance and control
Abstract/Summary:PDF Full Text Request
Audio event detection is a technique to detect the audio event category and its start and end time,it has a wide range of applications in civil and industrial fields such as security monitoring,biodiversity protection,multimedia search and recommendation,and scene perception.In a real environment,audio events may overlap with each other,and it is difficult to detect the categories of audio events and their start and end times.Although many research institutes at home and abroad have carried out research on this,the current detection technology is still not mature enough due to short development time and research on it started late,so there is still a large research space.In application scenarios where accurate time boundaries need to be detected,detection methods mainly rely on supervised learning.The data set used in supervised learning are often of limited size due to the time boundary of audio events can only rely on manual annotation.How to build a high-performance deep learning model with a limited data set is the current research difficulty.The thesis carried out research on audio event detection method based on deep learning,focused on the two main modules: artificial feature extraction and deep learning model,researched the method of extracting the mel-frequency cepstral coefficients and log mel band energy,researched the convolutional neural network,recurrent neural network and attention mechanism around the basic theory of deep learning,and examined the proposed detection models with the street scene dataset.The main three research contents are as follows:(1)The audio detection model based on convolutional neural network was established,four feature extraction schemes were analyzed through experiments,and the impact of the number of mel bands on detection performance has been explored.Using multi-channel multi-length window features based on log mel band energy,the different models composed of convolutional neural network and recurrent neural network were analyzed through experiments.The best detection model is BGNet which has a good detection effect,whose F1 value is 0.60 and ER value is 0.63.(2)The Squeeze-and-Excitation was studied,and three improved models based on Squeeze-and-Excitation and BGNet were established.The best detection model is BGNet-SE3 whose F1 value is 0.63 and ER value is 0.55.The experimental results show that the Squeeze-and-Excitation can significantly improve the detection performance of the audio event detection model.(3)The Convolutional-Block-Attention-Module was studied,and three attention mechanisms were experimented respectively,including the channel attention module,the spatial attention module and the complete Convolutional-Block-Attention-Module.Using BGNet network architecture,three improved models based on the spatial attention module,three models based on the channel attention module and three models based on the complete Convolutional-Block-Attention-Module were established.The best detection model based on spatial attention module is BGNet-SP2 whose F1 value is 0.63 and ER value is 0.56.The best detection model based on channel attention module is BGNet-CH3 whose F1 value is 0.64 and ER value is 0.57.Compared with other methods in this field,the proposed improved models have good detection performance.
Keywords/Search Tags:Audio event detection, Deep learning, Neural networks, Attention
PDF Full Text Request
Related items