Font Size: a A A

Sound Event Recognition Based On Feature Fusion And Neural Network

Posted on:2022-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306737954049Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,sound event recognition technology has received extensive attention in recent years.The technology has important value in the fields of audio monitoring and medical diagnosis.The process of sound event recognition is divided into two parts: feature extraction and classification recognition.In terms of feature extraction,spectrogram is usually used.Spectrogram is the "visual language" of sound signal,which can reflect the information of time domain and frequency domain of sound signal.In the aspect of classification recognition,Convolutional Neural Network(CNN)is used in most cases,which has powerful feature expression capabilities.However,it is unable to process the temporal context information in the input data effectively.And with the deepening of network layers,CNN is easy to cause problems such as gradient disappearance or explosion in the training process.In addition,when the task requirements are improved,the model structure becomes complex increasingly,and it is difficult for a single feature to support complex tasks.Therefore,in view of the above problems,this paper uses the method of feature fusion and improves CNN for sound event recognition.The main works of this paper are as follows:1.A sound event recognition method based on multi-feature channels and Squeeze-and-Excitation residual network is proposed.First,sound signal is processed to obtain the logarithmic Mel spectrogram,the logarithmic Cochlear spectrogram and the logarithmic Constant-Q Transform spectrogram.Then the three features are fused in a manner similar to the RGB model in the image,and the integrated features contain different information,which is used as the input of the Squeeze-and-Excitation residual network.The Squeeze-and-Excitation residual network introduces the Squeeze-and-Excitation module into the residual network.The Squeeze-and-Excitation residual network obtains a lot of useful information through the convolutional layer.Then,these useful information are sent to the layer containing the residual module and the Squeeze-and-Excitation module to pay attention to the relationship between the channels,so that the features related to the task are enhanced and irrelevant features are suppressed.Next,the global average pooling layer reduces the training parameters and the risk of overfitting of the model.Finally,the Softmax layer is used to classify sound events.Experimental results show that the proposed method can achieve better recognition results.2.A method of sound event recognition based on feature fusion and Convolution-Gated Recurrent Unit Neural Network is proposed.Feature fusion applies the cascaded method to fuse the logarithmic Mel spectrogram,the logarithmic Cochlear spectrogram and the logarithmic Constant-Q Transform spectrogram.It increases the information of features,improves the richness of features,and makes features better analyzed and processed.In addition,although CNN can effectively extract features,it is unable to express temporal context information well,and the Gated Recurrent Unit Neural Network can not only alleviate the problem of gradient disappearance or gradient explosion in the Recurrent Neural Network,but also make up for the defect of CNN.In order to benefit from these two networks,a Convolution-Gated Recurrent Unit Neural Network is proposed.The network consists of two different paths(CNN and Gated Recurrent Unit Neural Network).The path of CNN extracts the features of input data,and the path of Gated Recurrent Unit Neural Network learns the long-term dependence of input data.Experimental results indicate that the proposed method has better recognition results.
Keywords/Search Tags:sound event recognition, multiple feature channels, Squeeze-and-Excitation residual network, feature fusion, Convolution-Gated Recurrent Unit Neural Network
PDF Full Text Request
Related items