Font Size: a A A

Acoustic Scene Classification With Sound Field Decomposition

Posted on:2022-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:H C YangFull Text:PDF
GTID:2518306524985309Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Making machine judge its own space after receiving various sounds from the outside world,which is also called the acoustic scene classification(ASC).In recent years,benefit from the breakthrough in computational power and algorithm,the research on ASC has entered a period of fast iteration.At the same time,more and more new problems are found in the process of technical progress,one of which is the processing of stereo data sources.The largest dataset in the field of acoustic scene now is stereo format,while the current mainstream ASC methods still retain the thinking inertia of single channel even when using stereo data source.They do not pay special attention to the additional spatial information from stereo.The first work of this paper is to extract additional spatial information for stereo data sources.By using the phase information contained in stereo,stereo audio is further decomposed into four channels by using the sound field decomposition algorithm called primary ambient extraction(PAE),so as to obtain more spatial information.Although continue using the acoustic features of log mel energy,which dropouts phase information in the past implementation.While the phase information is encoded into the amplitude information of the increased channels in the proposed PAE-based implementation.On this basis,variety of ASC state-of-the-art methods are used to build a number of VGGNet models,and finally ensemble learning is used to get better performance.After migration to different data distribution with a series of experiments,the system using sound field decomposition has a stable performance improvement compared with the unused one.The best group of the settings has improved 17.3% recognition accuracy compared with the baseline system,and ranked fourth in the global event DCASE 2019 ASC subtask.Compared with the case without using the sound field decomposition,two ranking places are improved.Based on the first work,this paper also makes an application exploration of ASC.After merging the original functional redundant datasets into three classes,the low complexity migration is carried out.Firstly,the algorithm of sound field decomposition is optimized,which makes the calculation speed up by nearly 10 times compared with the original implementation.At the same time,the application of sound field decomposition is changed from feature generation to data augmentation to reduce the complexity of features.On this basis,the model parameters are compressed to less than 1% of the original setting.A low complexity stereo ASC system using the fine receptive field resolution provided by the stack of 1×1 residual block is built,with the similar size to the baseline system and error rate less than one third.Finally,a variety of model compression methods are used to verify that the data augmentation of sound field decomposition still works when the model is compressed again.
Keywords/Search Tags:Acoustic scene classification, Stereo, Sound field decomposition, Low complexity, ResNet
PDF Full Text Request
Related items