Font Size: a A A

Study Of Multi-Scale Features Fusion And Data Augmentation Methods For Acoustic Scene Classification

Posted on:2020-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:X X ChenFull Text:PDF
GTID:2428330599953685Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
Acoustic scene classification,which is one of the leading research directions in machine listening,is intended to enable the device to determine a semantic label for the acoustic scene by analyzing the characteristics of the acoustic signal.The technique has a broad application prospect in many fields,such as robot navigation and smart wearable devices.In recent years,the deep convolutional neural network has made breakthroughs in the field of computer vision,e.g.object detection,segmentation,and recognition,due to its superior performance.Because of the robust feature representation and pattern classification ability of deep convolutional neural networks,this thesis studies the acoustic scene classification method based on deep convolutional neural networks.The goal of acoustic scene classification is to obtain the semantic label of the environment.Although the feature maps of a deep convolutional neural network are semantic,acoustic scene classification using only the output feature map of the last convolutional layer may result in the degradation of the classification performance due to the lack of detailed texture information.In addition,the training data of acoustic scene classification is limited,and it is easy to encounter over-fitting problems when training deep convolutional neural networks,resulting in a decline in model generalization ability.In order to alleviate the above two issues,this thesis conducts research on multi-scale features fusion and data augmentation methods for acoustic scene classification based on the Xception framework.The contributions are as follows:(1)A simple and effective multi-scale feature fusion method is proposed.The method fuses deep semantic features and shallow detail texture features to obtain a fusion feature vector,this feature vector could effectively improve classification performance.In addition,acoustic scene classification is a typical pattern classification problem,samples beside the category boundary are crucial for classification.In order to highlight the contribution of category boundary samples in model training,this thesis expands the binary classification focus loss function to make it suitable for the multi-class case.Focus loss assigns different weights to the loss of each sample,making the model focus on samples near the classification boundary,which can further improve the performance of the model.(2)A multi-scale feature fusion and channel weighted method is proposed.The multi-scale feature fusion is to use the feature hierarchy of the convolutional neural network and to fuse the feature maps of different scales to obtain a feature map containing global semantic information and local detail texture information.In addition,since different sound events play different roles in determining the acoustic scene category,it can be considered that different channels of the feature map contribute differently to the classification.Therefore,this thesis proposes a learning-based channel weighting method,this method learns the weight of each channel and then weights each channel,which can effectively improve the classification performance of the model.(3)In order to alleviate the over-fitting problem in the deep convolutional neural network,a label smoothing mixup data augmentation method is proposed.Mixup is a simple and effective method for data augmentation,which can alleviate the over-fitting problem.However,the deep convolutional neural networks also have the problem of overconfidence in prediction results.Label smoothing is a way to solve the problem of overconfidence in models.This thesis introduces the label smoothing method into the mixup method and proposes a label smoothing mixup method,this method generates virtual training data by interpolation and smoothes the labels of the virtual data,which can effectively prevent the over-fitting problem and improve the generalization ability of the model.Experiments are conducted on the acoustic scene classification dataset of DCASE 2018 challenge.The experimental results indicate that the multi-scale feature fusion method and data augmentation method proposed in this thesis can effectively improve the performance of the model,and the classification accuracy is better than the best result of the DCASE 2018 challenge.
Keywords/Search Tags:Acoustic Scene Classification, Convolutional Neural Network, Multi-Scale Features Fusion, Channels Weighted, Data Augmentation
PDF Full Text Request
Related items