Sound Event Recognition Based On Feature Fusion And Neural Network

Posted on:2022-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2518306737954049

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,sound event recognition technology has received extensive attention in recent years.The technology has important value in the fields of audio monitoring and medical diagnosis.The process of sound event recognition is divided into two parts: feature extraction and classification recognition.In terms of feature extraction,spectrogram is usually used.Spectrogram is the "visual language" of sound signal,which can reflect the information of time domain and frequency domain of sound signal.In the aspect of classification recognition,Convolutional Neural Network(CNN)is used in most cases,which has powerful feature expression capabilities.However,it is unable to process the temporal context information in the input data effectively.And with the deepening of network layers,CNN is easy to cause problems such as gradient disappearance or explosion in the training process.In addition,when the task requirements are improved,the model structure becomes complex increasingly,and it is difficult for a single feature to support complex tasks.Therefore,in view of the above problems,this paper uses the method of feature fusion and improves CNN for sound event recognition.The main works of this paper are as follows:1.A sound event recognition method based on multi-feature channels and Squeeze-and-Excitation residual network is proposed.First,sound signal is processed to obtain the logarithmic Mel spectrogram,the logarithmic Cochlear spectrogram and the logarithmic Constant-Q Transform spectrogram.Then the three features are fused in a manner similar to the RGB model in the image,and the integrated features contain different information,which is used as the input of the Squeeze-and-Excitation residual network.The Squeeze-and-Excitation residual network introduces the Squeeze-and-Excitation module into the residual network.The Squeeze-and-Excitation residual network obtains a lot of useful information through the convolutional layer.Then,these useful information are sent to the layer containing the residual module and the Squeeze-and-Excitation module to pay attention to the relationship between the channels,so that the features related to the task are enhanced and irrelevant features are suppressed.Next,the global average pooling layer reduces the training parameters and the risk of overfitting of the model.Finally,the Softmax layer is used to classify sound events.Experimental results show that the proposed method can achieve better recognition results.2.A method of sound event recognition based on feature fusion and Convolution-Gated Recurrent Unit Neural Network is proposed.Feature fusion applies the cascaded method to fuse the logarithmic Mel spectrogram,the logarithmic Cochlear spectrogram and the logarithmic Constant-Q Transform spectrogram.It increases the information of features,improves the richness of features,and makes features better analyzed and processed.In addition,although CNN can effectively extract features,it is unable to express temporal context information well,and the Gated Recurrent Unit Neural Network can not only alleviate the problem of gradient disappearance or gradient explosion in the Recurrent Neural Network,but also make up for the defect of CNN.In order to benefit from these two networks,a Convolution-Gated Recurrent Unit Neural Network is proposed.The network consists of two different paths(CNN and Gated Recurrent Unit Neural Network).The path of CNN extracts the features of input data,and the path of Gated Recurrent Unit Neural Network learns the long-term dependence of input data.Experimental results indicate that the proposed method has better recognition results.

Keywords/Search Tags:

sound event recognition, multiple feature channels, Squeeze-and-Excitation residual network, feature fusion, Convolution-Gated Recurrent Unit Neural Network

PDF Full Text Request

Related items

1	Study On Polyphonic Sound Event Detection Based On Deep Learning
2	Research On Feature Extraction And Recognition Of Sound Event
3	Effective Feature Extraction On Sound Event Recognition
4	Research On Deep Network Model Based On Sound Event Location And Detection
5	Sign Language Recognition Based On Improved Convolutional Neural Network
6	Research On Recognition Of Sound Events Based On Multi-scale And Multi-level Feature Analysis
7	Spatiotemporal Squeeze-and-Excitation Residual Multiplier Networks For Video Action Recognition
8	Research On Sound Event Detection And Location Based On Improved CRNN Model
9	Research On Sound Event Detection Based On Deep Learning
10	Research On Deformable Feature Map Residual Network For Typical Urban Sound Recognition