Font Size: a A A

Acoustic Scene Classification Based On Hybrid Convolutional Neural Network

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z ZhangFull Text:PDF
GTID:2428330629982526Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Audio signals contain a lot of information content.Through audio signal processing technology,people can better perceive and understand the surrounding environment.Therefore,this technology has been widely used in monitoring,hearing aid devices,and intelligent terminals.Compared with image information data collection,audio tag data is simpler and takes up less memory.The recorded audio tags can easily express the information content.With the rapid development of Internet technology,there are more and more applications for Acoustic Scene Classification algorithms.Acoustic Scene Classification is to analyze the audio tag information and understand the audio semantic features,so as to achieve the purpose of identifying and understanding the surrounding environment content.The classification process of Acoustic Scene Classification includes feature extraction and classifier model construction.The main feature extraction methods are Mel spectrogram and Mel Frequency Cepstrum Coefficient.Under the same Convolutional Neural Network structure and model parameters,these two features are used for extraction and classification experiments,respectively.The experimental results show that the Mel Frequency Cepstrum Coefficient can better show the differences in the essential characteristics of different types of audio signals,and the precision of each type of audio scene is higher than the Mel spectrogram.The Convolutional Neural Network improves the performance of the model by performing nonlinear feature mapping on the image features of the audio signal and through effective training.Because a single Convolutional Neural Network classification is prone to over-fitting and other problems,it is necessary to perform fusion experiments on the system's neural network algorithm,mainly introducing Long Short Term Memory Network and eXtreme Gradient Boosting algorithms.Considering that the audio signals are timesequential,after extracting abstract features from the Convolutional Neural Network,Long Short Term Memory Network is introduced to process the audio information on the time axis of the audio signal.A mixed model of Convolutional Neural Network and Long Short TermMemory Network is proposed.Spectrograms are trained and classified by the last layer of softmax classifier.The classifier directly use softmax to predict the audio signal,but does not train the extracted features,and directly outputs the classification results.There are certain shortcomings.A mixed model of Convolutional Neural Network and eXtreme Gradient Boosting algorithm is proposed.By adding an eXtreme Gradient Boosting algorithms instead of softmax classifier,the extracted features are trained again to predict the classification accuracy of audio scene signals.The eXtreme Gradient Boosting algorithms is to iterate the loss function through the tree model,and optimize the objective function,and the classification results are output by the leaf nodes.The system model is trained and tested on a city audio dataset containing 10 categories.The experiment used accuracy,precision,recall,and F1-Score as performance evaluation indicators to measure audio scene classification.The experimental results show that the accuracy and precision of the mixed model classification results based on the Convolutional Neural Network and the eXtreme Gradient Boosting algorithms are the highest,and the precision,recall and F1-Score of each type of audio scene experiment are better The algorithm model has good effect,which verifies that the hybrid model adopted in this paper can better handle the Acoustic Scene Classification task.
Keywords/Search Tags:Acoustic Scene Classification, Mel Spectrogram, MFCC, CNN, LSTM, XGBoost
PDF Full Text Request
Related items