Font Size: a A A

Acoustic Scene Classification Method Based On Convolutional Neural Network

Posted on:2020-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:L S SunFull Text:PDF
GTID:2428330578465051Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Acoustic Scene Classification(ASC)is a task that enables devices to make sense of their environment through audio analysis.It belongs to a branch of the computational auditory scene analysis(CASA).At present,ASC has been widely used in intelligent wearable devices,robotics sensing,context-aware services and other application scenarios.In recent years,the development of deep learning has accelerated the research process of audio scene classification.As an important model in the field of deep learning,Convolutional Neural Networks(CNN)have strong learning ability.By introducing models CNN as audio scene classifier,the classification accuracy can be improved considerably,and even the machine can exceed the human level.In order to explore the applicability of CNN in ASC field and find the method to improve the system performance,this paper designs three groups of systems and conducts experiments and comparisons.The main work is as follows:This paper begins with the design of a baseline system based on the Merle frequency cepstral coefficient and the gaussian mixture model.Also it constructs a typical baseline system as a control group for subsequent systems by using traditional machine learning.Then it studies the principle of acoustic scene classification system based on CNN,probes into the applicability of convolution neural network in audio scene classification,and designs a basic system with two-layer convolution module.When training the system,filter parameters are adjusted to give full play to its classification potential,and the training time is also taken into account in system performance evaluation.In the evaluation stage,the classification accuracy of the basic system in each category is analyzed and the confusion matrix is introduced.It is found that the learning ability is stronger than that of the baseline system,but the generalization ability is poorer,and does not effectively use the spatial information in the audio file.According to the problems of the basic system,this paper designs an improved system to make the basic system better from audio processing and network structure.In terms of audio processing,binaural representation and harmonic-impulse source separation are used to process the original audio and extract the corresponding features.This enables the system to take advantage of the spatial characteristics of the scene,and then the classification accuracy has been significantly improved.As for network structure,the paper attempts to use the VGGNet structure in the field of image recognition for reference.to enhance the flexibility of the system while increasing the network depth.Finally,better generalization effect is achieved on different data.In addition,the improved system also uses Stacking method in ensemble learning to fuse multiple independent sub-models based on different characteristics.Compared with the sub-model,the classification performance of the integrated system is further improved.Through experiment and comparison,the final conclusion is that convolutional neural networks have stronger learning ability than traditional machine learning methods in the field of ASC.In designing the convolutional neural network,the flexibility of the network should be paid attention to.And the emphasis of improving the system performance should be focused on the network structure optimization rather than parameter adjustment,so as to avoid the poor generalization ability of the system caused by too many parameters.In addition,the integration of multiple sets of models through introducing ensemble learning methods can usually significantly improve the performance,but the independence between models should be paid attention to during the integration.Finally,if the stereo information can be utilized in the audio feature extraction stage,the spatial perception of the system can be improved,thereby the classification accuracy will be improved as well.
Keywords/Search Tags:Acoustic Scene Classification, Convolutional Neural Network, Mel frequency cepstral coefficient, ensemble learning
PDF Full Text Request
Related items