Font Size: a A A

Research On Acoustic Scene Classification Using Deep Learning

Posted on:2022-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:G J QiaoFull Text:PDF
GTID:2518306476490734Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Acoustic Scene Classification(ASC)is a method to associate audio with its recorded scene,and is one of the important topics of computer auditory scene analysis.Acoustic scene classification is mainly by extracting the features of the audio signal and classifying the analyzed features into corresponding scenes.The current acoustic scene classification system mainly consists of audio feature extraction and classifier.The extracted audio features mainly include Mel Frequency Cepstral Coefficients(MFCC)and Log-Mel spectrogram,the classifier mainly includes recurrent neural network,convolutional neural network and deep neural network.Researchers improve model performance by improving single model,multi-model integration,and transfer learning.In the case of poor quality of video information,the use of audio analysis to assist the work of the video classification system will make a certain contribution to the development of autonomous driving and smart cities.In order to solve the problem of low accuracy of acoustic scene classification,this thesis is mainly to improve the single model performance.The research is carried out from three aspects,based on the Log-Mel spectrogram,by changing the number of filters and using different channels of audio and Harmonic Percussive Source Separation(HPSS)enhancement method to extract audio features;on the basis of convolutional neural network as a classifier,by adding a Squeeze Excitation(SE)module will be able to pay attention to information between feature channels,and innovatively uses SE to extract information between different frequencies;based on the classic convolutional neural network structure Visual Geometry Group(VGG)and the basic structural units in Inception,one Inception structural unit and two VGG basic structural units form a hybrid network as a classifier.On the dataset of the 2019 challenge on Detection and Classification of Acoustic Scene and Events(DCASE),experiments show that features extracted with the appropriate number of filters and using HPSS can improve the accuracy of audio scene classification;channel-based squeeze excitation block can improve the classification performance,the frequency-based squeeze excitation module has improved the classification effect in some scenes;the classification effect of the hybrid network-based model performs better in the classification accuracy of some scenes.
Keywords/Search Tags:Acoustic scene classification, Convolutional neural network, Squeeze excitation, Harmonic percussive source separation
PDF Full Text Request
Related items