Font Size: a A A

Audio Scene Recognition Based On Deep Neural Network Of Multiple Optimization Mechanisms

Posted on:2022-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:J T HuFull Text:PDF
GTID:2518306515972399Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Acoustic Scene Understanding(ASU)is one of the future research hotspots in the field of artificial intelligence.It recognizes the semantic tags of the corresponding scene based on the content of the acoustic scene,so as to understand the surrounding environment.Audio Scene Recognition(ASR)is an extremely critical link in the field of acoustic scene understanding,and it is the premise and foundation for realizing acoustic scene understanding.Audio scene recognition judges the category of the environment through audio information and guides the machine to perceive the dynamics of the environment.It is one of the key technologies for processing,analyzing and applying audio information.In the massive audio database,nonvoice scene audio occupies a large proportion,and in the actual application process,the audio content and categories involved are extremely complex,so an accurate and convenient audio scene recognition technology is needed.With the continuous increase of paralleling computing capabilities,deep neural networks have shown excellent performance in the field of audio scene recognition,but traditional deep learning algorithms are difficult to continue to improve after they have achieved certain results in the field of audio scene recognition.Therefore,this article uses deep learning algorithms and many A way of organic integration of optimization mechanisms to improve the recognition effect.In this paper,Convolutional Neural Networks(CNN)and Bidirectional Gated Recurrent Unit(Bi-GRU)networks are used to construct a basic paralleling framework,and then Batch Normalization(BN)and hierarchical attention are integrated.The Hierarchical Attention Networks(HAN)generation model solves the problem of audio scene recognition,which is called a paralleling convolutional recurrent neural network model based on multiple optimization mechanisms.Paralleling convolutional recurrent neural networks have powerful spatial and temporal feature learning capabilities,which can effectively extract audio feature parameters;batch standardization mechanism not only stabilizes the distribution of input data in each layer of the network,speeds up the convergence speed of the model,but also simplifies the network The parameter tuning process of the model makes the network more stable during the training process;the hierarchical attention mechanism can assign attention weights to different time frames in a given audio feature vector,allowing the model to "pay more attention" to those who recognize The key frame that plays an important role,thereby improving the recognition performance of the network model.The organic integration of the above optimization mechanisms greatly enhances the recognition performance of the network model,so that the audio scene recognition task can be completed well.In this task,the audio signal is first converted into a mel spectrogram of a certain size after preprocessing,and then input into the network model for sufficient spatial and temporal feature learning,and finally the recognition task is completed.In order to verify the effectiveness of the model,the recognition performance test was carried out on the DCASE2019 audio scene data set.The paper uses multiple sets of experiments to compare the paralleling convolutional recurrent neural network,the paralleling network structure that integrates a single optimization mechanism,and the paralleling network that integrates multiple optimization mechanisms.The effectiveness of the structure was verified,and the recognition accuracy of the model reached 88.84%.The experimental results showed that the paralleling convolutional recurrent neural network based on the multi-optimization mechanism proposed in this paper is better than the traditional neural network model and can complete the audio scene recognition task well.And has good stability and generalization ability.
Keywords/Search Tags:Audio scene recognition, Convolutional neural network, Batch normalization, Bidirectional gated recurrent unit, Hierarchical attention network
PDF Full Text Request
Related items