Font Size: a A A

Research On Acoustic Scene Recognition Algorithm Based On Convolutional Neural Network

Posted on:2020-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:F J SunFull Text:PDF
GTID:2428330611499449Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Environmental sounds carry a large amount of information about everyday environments and physical events,and developing techniques to automatically extract and analyze this information is of great significance in some applications.It can make a variety of convenient equipment more intelligent,can be used in the military,criminal investigation and other fields.Traditional acoustic scene classifier such as support vector machine,gaussian mixture model,hidden markov model and k-nearest neighbor model are no longer suitable for com plex multi-classification tasks.The deep neural network model can fit any nonlinear task,and among the many deep neural networks,the convolutional neural network develops rapidly and is most widely used,so the convolutional neural network is used as a classifier in this paper.According to the structural characteristics of classical convolutional neural networks such as Alex Net,VGGNet and Res Net,Alexish,VGGish and Resish are respectively designed in this paper.Alexish has retained Alex Net's features and made some improvements.The normalization of partial responses in Alex Net should be batch standardization,which can speed up training and prevent overfitting.The experimental results show that the highest recognition rate of the six-layer Alexish network structure is 67.6%.On the basis of VGGNet,VGGish in this paper changed the full connection layer of the first layer of the network to global average pooling,which reduced the operation amount without affecting the accuracy.The experimental results show that the recognition rate of 9-layer dual-channel VGGish network structure is up to 71%.Resish used the hop mechanism of Res Net to construct an 18-layer Resish network structure based on VGGish,and the recognition rate reached 71.4%.In the aspect of audio feature extraction,this paper proposes to separate the two features of MEL logarithmic spectrum and harmonic shock source as the audio features of this paper.Mayer logarithmic spectrum takes full account of the nonlinear characteristics of human hearing and harmonic shock source separation method takes into account the characteristics of audio composition.The experimental results show that MEL logarithmic spectrum audio features based on harmonic shock source separation can make the network get a higher accuracy.In addition,in order to make full use of the difference between the two-channel audio channels,the subtraction results of the left and right audio channels and the addition results of the left and right audio channels were extracted as the input of the two-channel network,and the features were extracted through the convolutional neural network and then classified by feature combination.The experimental results show that the feature joint method is more efficient than the single channel feature recogn ition method.In order to further improve the accuracy rate,this paper utilizes the differences in the sensitivity of different models to different scenes,and proposes to combine multiple convolutional neural network models to form a strong classifier th rough the integrated learning method.Finally,Bagging was used to integrate 7 convolutional network models and adopt the combined strategy of relative majority voting to classify 10 acoustic scenes,and the accuracy rate was 74.7%,12.2% higher than the baseline system of acoustic scene event detection and classification contest in 2019,effectively improving the accuracy rate of acoustic scene recognition.
Keywords/Search Tags:audio scene recognition, convolutional neural network, ensemble learning
PDF Full Text Request
Related items