Font Size: a A A

Research On Audio Scene Recognition Based On Deep Learning

Posted on:2019-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2428330545474788Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and new media platforms,the total amount of audio data has become increasingly large.In the context of big data and artificial intelligence,the field of application of audio scene recognition technology is more and more widely used,and its importance is self-evident.Audio scene recognition is essentially the perception of the sound features and acoustic events contained in the sound signal,then processes and analyzes these features to classify the audio signal.The choice of acoustic features directly affects the quality of the classification results,so it is especially critical to choose the appropriate acoustic features.The MEL Frequency Cepstrum Coefficient(MFCC)can simulate the characteristics of the human ear to sound processing.It is easier to calculate than other acoustic features,and can capture the part of the signal that has a degree of recognition.Therefore,the MFCC's anti-jamming capability is superior to other acoustic features in the classification tasks about audio recognition.In recent years,the research on artificial intelligence has gradually matured.The deep-learning technology has developed rapidly.It has made revolutionary breakthroughs in the fields of pattern recognition,machine learning and so on,and more and more people are engaged in research on deep learning.Deep neural network is an important research direction in deep learning.Compared with shallow neural networks,it has a more complex network structure,a more powerful combination of computing capabilities,and more detailed feature analysis capabilities.The Convolutional Neural Network(CNN)adopts a deep learning framework and is a classical network model in deep neural networks.It has features such as weight sharing and local connectivity.This makes convolutional neural network in the training process need to learn fewer parameters,fewer network nodes,reduce network complexity,reduce computational overhead.Convolutional neural network has excellent feature extraction capabilities.Due to its unique network structure,it has stronger anti-distortion ability and input-invariance,and performs particularly well in classification tasks.This topic selects MFCC as the acoustic features extracted from the scene audio,uses the superior performance of CNN in feature extraction and classification to further extract and analyze acoustic features,obtain more advanced and abstract features,and classify them to achieve the purpose of improving the classification accuracy.In this paper,main work is as follows:(1)In-depth research on audio scene recognition technology,deep learning,acoustic characteristics,etc.Collating and summarizing development status in various fields at home and abroad,and detailed introduction to the development history of deep learning and audio scene recognition technology,classification and characteristics of acoustic features.(2)Explained the principle of MFCC and gave the extraction process,introduced the theoretical basis and classification process of K-Neighbor Nearest(KNN)algorithm,designed and built a baseline experimental system using KNN as a classifier,adjusting important parameters perform multiple sets of controlled experiments to obtain and analyze experimental results.(3)Introduced CNN's network structure,calculation methods,learning algorithms and application scope.A scene audio classification system based on MFCC and CNN is designed and constructed.The overall flow of the experiment and the network architecture of the convolutional neural network in the system are given.First,acoustic features of scene audio are extracted,and acoustic feature data sets are divided into training sample sets and test sample sets.Next,the training sample set is used to train the convolutional neural network,then the network is trained to convergence,so that the learning of parameters such as weights reache an optimal state.Finally,using the completed convolutional neural network to identify the audio scene of the test sample set,the scene recognition accuracy rate is obtained.At the same time,important parameters in the CNN: convolution kernel size,number of feature maps,activation function,etc.are adjusted to compare the size and trend of the classification accuracy after the parameter adjustment.By observing the experimental results of the two groups,the overall recognition rate of the scene recognition system based on MFCC and CNN is 1.4% higher than that of the baseline system.After adjusting the parameters such as the convolution kernel size and the number of feature maps,the overall recognition rate has a slight increase.Therefore,the experimental system based on MFCC and CNN is superior to the baseline system in the overall recognition rate.
Keywords/Search Tags:Audio scene recognition, Convolutional neural network, Mel frequency cepstrum coefficient, K-nearest neighbor algorithm
PDF Full Text Request
Related items