Font Size: a A A

A Study On Acoustic Scene Classification By Ensembling Multiple Deep Models

Posted on:2018-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:F F PengFull Text:PDF
GTID:2348330533969798Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Acoustic scene classification(ASC)is a specific task in Computational Auditory Scene Analysis(CASA)domain.It based on the audio and a specific scene semantic label to recognize the specific scene in real environment.Unlike the psychological studies devoted to understanding the mechanism of perceiving audio scene,ASC mainly relies on signal processing techniques and machine learning methods to automatically recognize audio scenes.Traditional ASC task mainly focuses on Feature Engineering and classifier selection for detecting a single scene.With the rapid development of audio collection devices,a wide variety of audio have been collected.And with a large variety of audio scene data,the traditional method has more and more difficulties in applying or acquiring better performance.In order to make full use of a large va riety of audio scene data,this paper uses various deep learning methods.Firstly,extracting Mel-Frequency Cepstral Coefficients(MFCC)feature or log-mel spectrogram feature.Then the frame features are spliced into segment features.Finally,put these s egment features into the deep learning models,such as Multi-layer Perceptron(MLP),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM).In order to improve the ASC system based on LSTM model,a segment processing technique is proposed in th is paper.The technique not only can simulate complex temporal relations,but also enlarge training data.In order to improve the ASC system based on MLP model,Attention mechanism is introduced in this method.After this process,the network can break thr ough the limitations of global data representation,and pay more attention to the key parts of the data.At the same time,the Attention mechanism can deal with the decoupling problem.That is to say,using different feature space to describe different scenes.Different kinds of deep learning methods have different recognition abilities when recognizing different scenes.For example,MLP has an advantage over beaches and residential areas,while CNN is easier to detecting libraries and buses.Usually,Ensemble learning can significantly achieve better generalization performance than a best single learner.In order to make full use of different classifiers,the Bagging Ensemble Selection(BES)is used and has received a best performance.
Keywords/Search Tags:acoustic scene classification, deep learning, attention, ensemble methods
PDF Full Text Request
Related items