Feature Augmentation And Model Build For Acoustic Scene Classification With Multiple Devices

Posted on:2022-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Liu

Full Text:PDF

GTID:2518306764462754

Subject:Telecom Technology

Abstract/Summary:

PDF Full Text Request

Acoustic scene refers to the environment in which the sound is recorded,such as airport,park,etc.In machine hearing,acoustic scene classification refers to the task of as-sociating a semantic label to an audio stream that identifies the environment in which it has been produced.Benefiting from the development of computing power and deep learning algorithm,acoustic scene classification under laboratory conditions has achieved excel-lent results.However,there is still a problem of weak generalization ability to unknown scenes and unknown cities when facing practical applications.To solve this problem,it is necessary to obtain a large amount of audio data from the Internet.But the audio data on the Internet always come from different recording devices.The audio data recorded by different recording devices have different device characteristics,and at the same time,the data volume of different devices is unbalanced.All these will have a negative impact on the performance of the acoustic scene classification systems.This thesis focus on the cross-device acoustic scene classification.Firstly,the feature augmentation method is used to improve the performance of the cross-device acoustic scene classification.The head-related transfer functions are used to introduce a large number of frequency responses from different spatial angles,while ex-panding the number of acoustic feature channels.Then the spectrum augmentation meth-ods are used to transform the acoustic features on the time axis and frequency axis.The mixup method is used to mix two different acoustic features to obtain a new virtual acous-tic feature.The classification accuracy of the ensemble baseline model using the feature augmentation method is 58.2%,which is increased by 10.9% compared with the origi-nal baseline model? The log loss is 1.119,which is 0.377 lower than the original baseline model.Then,this thesis focus on the neural network model for the cross-device acoustic scene classification.Considering that the two dimensions of acoustic features are dif-ferent from the two spatial dimensions of images,three models for acoustic scene clas-sification are built,which are VGGNet based model,two path Res Net model and sub-spectral normalization Res Net model.The focal loss function and the learning strategy of Cosine Annealing after warmup are used for training.The classification performance of the three models is higher than the baseline model.After combining the feature aug-mentation method,the subspectral normalization Res Net model performs best,reaching the classification accuracy of 72.9%,an increase of 25.6% compared with the baseline model,and the log loss is 0.802,which is 0.694 lower than the baseline model.Finally,this thesis explores the problem of acoustic scene classification under low complexity in order to deploy the network model on resource constrained devices.The feature Reuse convolution and 1-bit weight training method are used,so that the trained model can be stored in a single bit,reducing the amount of parameters and storage space.In order to cooperate with the low complexity training,the feature augmentation method is optimized.The head-related transfer functions and spectrum correction feature augmenta-tion methods are used to balance the proportion of data in the training set and weaken the sensitivity of the neural network model to the recording devices.The final model storage size is 86 KB,the classification accuracy is 69.5%,an increase of 22.2% compared to the baseline model,and the log loss is 0.913,which is 0.583 lower than the baseline model.

Keywords/Search Tags:

Cross-device Acoustic Scene Classification, Feature Augmentation, Convo-lutional Neural Network, Ensemble Learning, Low Complexity

PDF Full Text Request

Related items

1	Acoustic Scene Classification Method Based On Convolutional Neural Network
2	Research On Acoustic Scene Detection Based On Deep Learning
3	Study Of Multi-Scale Features Fusion And Data Augmentation Methods For Acoustic Scene Classification
4	Research On ToF Depth Map Denoising Based On Deep Neural Networks
5	Research On Acoustic Scene Classification Using Deep Learning
6	Research On Acoustic Scene Classification Based On Convolutional Neural Network
7	A Study On Acoustic Scene Classification By Ensembling Multiple Deep Models
8	Algorithm Research Based On Neural Network For Audio Scene Recognition
9	Spectrogram Feature Learning And Model Transplantation Of Convolutional Neural Network Acoustic Scene Classification
10	Cross-modal Feature Augmentation For Visual Semantic Understanding