Font Size: a A A

Spectrogram Feature Learning And Model Transplantation Of Convolutional Neural Network Acoustic Scene Classification

Posted on:2021-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:L J TaoFull Text:PDF
GTID:2518306107492074Subject:Engineering
Abstract/Summary:PDF Full Text Request
Acoustic scene classification is one of the most important research themes in the field of machine listening.It aims to analyze audio data and then classify it into one of the predefined categories provided.such as "park"," "Pedestrian Street","Subway Station",etc.Designing acoustic signal process methods to automatically extract scene information has great potential in many applications,intelligent auto uses audio to analyze the around environment and make corresponding auxiliary decisions,noise reduction headphones learn the scene by collecting and analyzing the surrounding atmosphere sound to generate corresponding noise reduction curve and so on.In recent years,deep convolutional neural networks have been widely used in image recognition,object detection,and semantic segmentation.More and more acoustic researchers are also looking at the research of convolutional neural networks.Given the above,this thesis studies acoustic scene classification methods based on deep convolutional neural networks.Current research shows that a convolutional neural network is only sensitive to some frequency bands of the sound spectrogram in the classification of acoustic scenes,such as the frequency band containing background sound or obvious scene characteristics.Therefore,when the acoustic spectrogram is directly sent to the convolutional neural network for training,the non-discriminatory frequency band features in the spectrogram may be used to confuse the classification boundary.Besides,although the convolutional neural network has made a breakthrough in the field of acoustic scene classification,the high complexity network structure also brings many problems in deploying algorithms.How to optimize the neural network structure and reduce the number of model parameters is still a big challenge.To alleviate the above problems,this thesis mainly develops the following works:(1)A feature extraction method for the Log Mel spectrogram based on the convolutional neural network is proposed.This method uses a convolutional neural network to extract the Log Mel spectrogram features and handles audio signals in an end-to-end manner based on deep learning.Compared with the manual feature extraction,this method can reduce the storage cost of features and perform rapid feature extraction on a hardware platform with neural network optimization.(2)An acoustic scene classification method based on multi-channel spectrogram features is proposed.This method first reorganizes the band information of the Log Mel features to generate multi-channel spectrogram features with different band information composition.Then,according to the frequency band information possessed by different channels,which contributes different characteristics to the classification of acoustic scenes,a channel attention network is introduced to select important reorganized frequency band channels for scene recognition.Finally,the label smoothing method is used to improve the generalization ability of the model.The acoustic scene classification method of the multi-channel spectrogram features can select the distinguished recombination band spectrogram feature for acoustic scene classification on the one hand;on the other hand,each channel of the multi-channel spectrogram feature are the outcomes of the original spectrogram randomly selecting the frequency band,it has certain locality.The multi-channel features with ensemble learning have clear information complementarity,which can improve the classification performance to a certain extent.Finally,experiments are conducted on the DCASE2019 dataset and achieved a classification accuracy rate of 79.64% on the validation set,which is higher than the official benchmark method and most open-source approaches.(3)An acoustic scene classification method for embedded platforms is developed.This method aims to solve the problem that the resources of mobile platforms are limited and large-scale convolutional neural network algorithms cannot be deployed.First,a simplified MobileNetV2 model is built.Then,based on model parameter fusion and fixed-point technology,the simplified MobileNetV2 model is optimized and compressed.Finally,the acoustic scene classification algorithm of the embedded platform is transplanted by reducing the parameter amount to 30% of the original MobileNetV2 model.
Keywords/Search Tags:Acoustic Scene Classification, Convolutional Neural Network, Spectrogram Feature Learning, Model Transplantation
PDF Full Text Request
Related items