Font Size: a A A

Residual Network And Its Variant Network For Acoustic Scene Classification

Posted on:2021-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:M YeFull Text:PDF
GTID:2428330620465546Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Acoustic Scene Classification can identify recordings recorded in public areas as one of several predefined acoustic scene categories,such as determining that the recorded audio occurred in "parks","walking streets" or "metro stations".The technology of Acoustic Scene Classification can be widely used in mobile devices and intelligent robots.With the development of deep learning technology in recent years,more and more deep learning techniques have been applied to the classification of acoustic scenes.In various neural networks for deep learning,the convolutional neural networks is the most in-depth study by researchers.For convolutional neural networks,researchers have made several attempts to build deeper and deeper convolutional neural networks,such as GoogleLeNet with 22 convolutional layers and residual network with 152 convolutional layers,respectively.By increasing the depth,the convolutional neural network makes the network use the increased non-linearity to obtain a structure similar to the objective function,thereby obtaining better characteristics.This thesis is mainly based on residual network and its variant network for Acoustic Scene Classification.The main contents of this article are:(1)Propose an Acoustic Scene Classification algorithm of Residual Attention Network based on micro-batch training.Residual Attention Network that performs well in ImageNet classification is applied to Acoustic Scene Classification with mismatched recording devices based on micro-batch training.We modify the input of the Residual Attention Network so that they operate on the log-mel spectrogram domain of audios.In order to further improve performance,the problem of micro-batch training is solved by switchable normalization and weight standardization without powerful hardware and memory resources.We choose 4-layer and 8-layer convolutional neural networks as the baseline system,respectively.On the TUT Urban Acoustic Scenes 2018 Mobile dataset,the best performance with 58.6% accuracy was achieved by using the optimal setting found in our experiments,improves class-wise accuracy by 1.1% compared to the 4-layer convolutional neural networks baseline system,improves class-wise accuracy by 1.4% compared to the 8-layer convolutional neural networks baseline system.(2)Propose an Acoustic Scene Classification algorithm of residual network based on transfer learning.Transfer learning is used to fine-tune the pre-trained residual network model of the residual network makes it suitable for Acoustic Scene Classification.The pre-training model of the residual network is based on the ImageNet dataset.Furthermore,the focal loss is used to improve overall performance.In order to reduce the chance of overfitting,data augmentation technique is applied based on mixup.We use the Squeeze Excitation Residual Network without transfer learning as the baseline system.On the TUT Urban Acoustic Scenes 2018 dataset,the best performance with 74.7% accuracy was achieved by using the optimal setting found in our experiments and improves class-wise accuracy by 2.2% compared to the baseline system.
Keywords/Search Tags:Acoustic Scene Classification, Residual Attention Network, transfer learning, residual network
PDF Full Text Request
Related items