Research On Deep Neural Network Model Of Audio Tagging

Posted on:2021-10-21

Degree:Master

Type:Thesis

Country:China

Candidate:L Cui

Full Text:PDF

GTID:2568306104464074

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the success of speech recognition and image processing based on deep learning,audio tagging has also attracted more and more attention.With the development of intelligent mobile devices,a large number of users upload recordings to the network every day.How to label audio is an important research direction.Traditional manual feature making and shallow structure classifier need a lot of work,and they cannot make good use of the potential relationship between context information and different sound event classes.In view of the existing problems,this paper applies the deep learning neural network method to audio tagging to explore the impact on accuracy and performance.Firstly,a learnable context gating can help to select the most related features of the final audio event class.The attention mechanism can help the model pay more attention to the most related audio frames of the audio event class.Therefore,context gating and attention mechanism are introduced into convolutional recurrent neural network to form attention gated convolutional recurrent neural network(AT-GCRNN).The AT-GCRNN is used for general audio tagging and compared with convolutional neural network(CNN)and convolutional recurrent neural network(CRNN).The experimental results show that ATGCRNN has better performance than CNN and CRNN in audio tagging accuracy.Secondly,the time-frequency segmentation mask network can separate the timefrequency domain sound events from the background scene and enhance the sound events in the audio clip.Compared with CNN,the Mobile Net V2 reduces the network parameters.Res2 Net can increase the receptive field of each network layer.Therefore,the improved time-frequency segmentation network is used to model the urban sound tagging,and the comparison test with VGGNet and CNN network is carried out.The results show that the improved time-frequency segmentation network model is faster and more accurate than other networks.Finally,a deep neural network framework based on the combination of atrous convolution and Res2 Ne Xt is constructed and applied to the urban sound tagging.The model is compared with VGGNet network and modified-Mobile Net V2 model.The atrous convolution has the characteristics of capturing multi-scale context information.Res2 Ne Xt is an improvement based on Res2 Net,which can improve the accuracy of classification on the premise of reducing the number of super parameters.The results show that the classification performance of the model is better than that of the other two networks.

Keywords/Search Tags:

Audio tagging, Deep learning, Convolutional recurrent neural network, Time-frequency segmentation network, Atrous convolution

PDF Full Text Request

Related items

1	Research On Deep Network Model Based On Sound Event Location And Detection
2	Research On Semantic Segmentation Algorithm Based On Fully Convolutional Neural Network
3	Identification Of The InSAR Persistent Scatterers Based On Deep Learning
4	Research Of Audio Alassification Algorithms Based On Convolutional Neural Network And Its Applications
5	Research On Semantic Segmentation Of Scene Image Based On Deep Learning
6	Medical Image Segmentation Based On Deep Learning
7	Research On Human Semantic Segmentation Based On Deep Learning
8	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
9	Study Of Deep Learning-based Music Automatic Tagging Methods
10	Research On Image Semantic Segmentation Based On Convolutional Neural Network