Font Size: a A A

Acoustic Scene Classification Using Multi-Scale Deep Feature Aggregation

Posted on:2022-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Ho Ka ChonHJJFull Text:PDF
GTID:2518306569469764Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Acoustic scene classification(ASC)is a task related to the field of machine listening whose important role is to recognize and categorize audio data in a predefined label which describes a scene location.In most of the state-of-the-art works for ASC,hand-crafted features and singlescale deep features were adopted as the input of back-end-classifiers.Due to the noisy nature of the audio signal and great variation of time frequency properties with each class of acoustic scenes,these features are not able to effectively representing characteristic differences among different acoustic scenes.As a result,the task of ASC is still challenging and unsolved,although significant efforts were made by many researchers.Inspired by the success of multi-scale deep features in the field of computer vision,we propose an ASC method by aggregating multi-scale deep features that are learned by convolutional neural networks(CNNs).We perform experiments using several state-of-art models for ASC on two official datasets of the challenge on Detection and Classification of Acoustic Scenes and Events(DCASE),i.e.,DCASE 2019 and DCASE 2017.Both DCASE 2019 and DCASE 2017 datasets are publicly available for research purpose and competition challenges,and include 10 classes and 15 classes of acoustic scene respectively.In total of the two datasets,57 hours of audio recordings are available which includes indoor and outdoor environmental scenes,such as city,bus,beach,train,airport,forest path,etc.In the proposed method,the acoustic feature of Mel-frequency cepstral coefficient(MFCC)is first extracted from each audio sample,and then fed into two CNNs with different architectures.Next,the multi-scale deep features generated by the two CNNs are concatenated and fed into fully-connected layers for obtaining classification results.The proposed method obtains the improvement of classification accuracy by 11% and 9% on DCASE-2019 and DCASE-2017 datasets respectively compared to the baseline system.
Keywords/Search Tags:Acoustic scene classification, Multi-scale deep feature, Deep neural network, Deep learning
PDF Full Text Request
Related items