Font Size: a A A

Feature Augmentation And Model Build For Acoustic Scene Classification With Multiple Devices

Posted on:2022-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z LiuFull Text:PDF
GTID:2518306764462754Subject:Telecom Technology
Abstract/Summary:PDF Full Text Request
Acoustic scene refers to the environment in which the sound is recorded,such as airport,park,etc.In machine hearing,acoustic scene classification refers to the task of as-sociating a semantic label to an audio stream that identifies the environment in which it has been produced.Benefiting from the development of computing power and deep learning algorithm,acoustic scene classification under laboratory conditions has achieved excel-lent results.However,there is still a problem of weak generalization ability to unknown scenes and unknown cities when facing practical applications.To solve this problem,it is necessary to obtain a large amount of audio data from the Internet.But the audio data on the Internet always come from different recording devices.The audio data recorded by different recording devices have different device characteristics,and at the same time,the data volume of different devices is unbalanced.All these will have a negative impact on the performance of the acoustic scene classification systems.This thesis focus on the cross-device acoustic scene classification.Firstly,the feature augmentation method is used to improve the performance of the cross-device acoustic scene classification.The head-related transfer functions are used to introduce a large number of frequency responses from different spatial angles,while ex-panding the number of acoustic feature channels.Then the spectrum augmentation meth-ods are used to transform the acoustic features on the time axis and frequency axis.The mixup method is used to mix two different acoustic features to obtain a new virtual acous-tic feature.The classification accuracy of the ensemble baseline model using the feature augmentation method is 58.2%,which is increased by 10.9% compared with the origi-nal baseline model? The log loss is 1.119,which is 0.377 lower than the original baseline model.Then,this thesis focus on the neural network model for the cross-device acoustic scene classification.Considering that the two dimensions of acoustic features are dif-ferent from the two spatial dimensions of images,three models for acoustic scene clas-sification are built,which are VGGNet based model,two path Res Net model and sub-spectral normalization Res Net model.The focal loss function and the learning strategy of Cosine Annealing after warmup are used for training.The classification performance of the three models is higher than the baseline model.After combining the feature aug-mentation method,the subspectral normalization Res Net model performs best,reaching the classification accuracy of 72.9%,an increase of 25.6% compared with the baseline model,and the log loss is 0.802,which is 0.694 lower than the baseline model.Finally,this thesis explores the problem of acoustic scene classification under low complexity in order to deploy the network model on resource constrained devices.The feature Reuse convolution and 1-bit weight training method are used,so that the trained model can be stored in a single bit,reducing the amount of parameters and storage space.In order to cooperate with the low complexity training,the feature augmentation method is optimized.The head-related transfer functions and spectrum correction feature augmenta-tion methods are used to balance the proportion of data in the training set and weaken the sensitivity of the neural network model to the recording devices.The final model storage size is 86 KB,the classification accuracy is 69.5%,an increase of 22.2% compared to the baseline model,and the log loss is 0.913,which is 0.583 lower than the baseline model.
Keywords/Search Tags:Cross-device Acoustic Scene Classification, Feature Augmentation, Convo-lutional Neural Network, Ensemble Learning, Low Complexity
PDF Full Text Request
Related items