Acoustic Scene Classification Method Based On Convolutional Neural Network

Posted on:2020-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:L S Sun

Full Text:PDF

GTID:2428330578465051

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Acoustic Scene Classification(ASC)is a task that enables devices to make sense of their environment through audio analysis.It belongs to a branch of the computational auditory scene analysis(CASA).At present,ASC has been widely used in intelligent wearable devices,robotics sensing,context-aware services and other application scenarios.In recent years,the development of deep learning has accelerated the research process of audio scene classification.As an important model in the field of deep learning,Convolutional Neural Networks(CNN)have strong learning ability.By introducing models CNN as audio scene classifier,the classification accuracy can be improved considerably,and even the machine can exceed the human level.In order to explore the applicability of CNN in ASC field and find the method to improve the system performance,this paper designs three groups of systems and conducts experiments and comparisons.The main work is as follows:This paper begins with the design of a baseline system based on the Merle frequency cepstral coefficient and the gaussian mixture model.Also it constructs a typical baseline system as a control group for subsequent systems by using traditional machine learning.Then it studies the principle of acoustic scene classification system based on CNN,probes into the applicability of convolution neural network in audio scene classification,and designs a basic system with two-layer convolution module.When training the system,filter parameters are adjusted to give full play to its classification potential,and the training time is also taken into account in system performance evaluation.In the evaluation stage,the classification accuracy of the basic system in each category is analyzed and the confusion matrix is introduced.It is found that the learning ability is stronger than that of the baseline system,but the generalization ability is poorer,and does not effectively use the spatial information in the audio file.According to the problems of the basic system,this paper designs an improved system to make the basic system better from audio processing and network structure.In terms of audio processing,binaural representation and harmonic-impulse source separation are used to process the original audio and extract the corresponding features.This enables the system to take advantage of the spatial characteristics of the scene,and then the classification accuracy has been significantly improved.As for network structure,the paper attempts to use the VGGNet structure in the field of image recognition for reference.to enhance the flexibility of the system while increasing the network depth.Finally,better generalization effect is achieved on different data.In addition,the improved system also uses Stacking method in ensemble learning to fuse multiple independent sub-models based on different characteristics.Compared with the sub-model,the classification performance of the integrated system is further improved.Through experiment and comparison,the final conclusion is that convolutional neural networks have stronger learning ability than traditional machine learning methods in the field of ASC.In designing the convolutional neural network,the flexibility of the network should be paid attention to.And the emphasis of improving the system performance should be focused on the network structure optimization rather than parameter adjustment,so as to avoid the poor generalization ability of the system caused by too many parameters.In addition,the integration of multiple sets of models through introducing ensemble learning methods can usually significantly improve the performance,but the independence between models should be paid attention to during the integration.Finally,if the stereo information can be utilized in the audio feature extraction stage,the spatial perception of the system can be improved,thereby the classification accuracy will be improved as well.

Keywords/Search Tags:

Acoustic Scene Classification, Convolutional Neural Network, Mel frequency cepstral coefficient, ensemble learning

PDF Full Text Request

Related items

1	Research On Acoustic Scene Classification Based On Convolutional Neural Network
2	Research On Acoustic Scene Classification Using Deep Learning
3	Research On Acoustic Scene Recognition Algorithm Based On Convolutional Neural Network
4	Feature Augmentation And Model Build For Acoustic Scene Classification With Multiple Devices
5	Acoustic Scene Classification Based On Hybrid Convolutional Neural Network
6	Spectrogram Feature Learning And Model Transplantation Of Convolutional Neural Network Acoustic Scene Classification
7	A Study On Acoustic Scene Classification By Ensembling Multiple Deep Models
8	Research On Acoustic Scene Detection Based On Deep Learning
9	Research On Acoustic Scene Classfication Method Based On Subspectrogram
10	Scene Classification Based On Convolutional Neural Network