Font Size: a A A

Rare Sound Events Detection Based On Deep Neural Network

Posted on:2019-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:K W WangFull Text:PDF
GTID:2428330566477305Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
Audio surveillance technology is one of the most important parts in surveillance applications.The key to the implementation of intelligent audio surveillance is that rare sounds events are detected automatically in background sounds.Rare sound events detection is essentially a pattern classification task,which are mainly composed of two parts.First,the hand-crafted features are extracted using speech signal processing techniques.Then machine learning algorithm is used to train an effective classifier for recognition using these features as input.Because the input of the traditional rare sound detection model usually needs a fixed dimensional feature,and the difference in the duration of different rare sounds may be large.So,sliding windows with different lengths are used to capture the sound segments,and then these segments are used as inputs to the model.Which is more time-consuming and the predicted boundary is far away the ground truth.With the improvement of computing power,the emergence of large-scale data sets and the development of deep learning algorithms,deep neural networks perform more and more well in pattern classification tasks.This paper carries out some research works on the rare sound event detection based on deep neural network,and our main work is summarized as follows:(1)An new method for audio event detection and classification method is proposed by extending an architecture called Region-based Fully Convolutional Networks.The method takes the audio signal's Log grayscale spectrogram feature as input and consists of two stages.In the first stage,it detects whether there are audio events by sliding the convolution kernel on the time dimension,and then generate candidate regions that may contain audio events through Region Proposal Network.This can be also understood as a boundary detection.In the second stage,time and frequency domain information are integrated to classify these candidate regions and fine-tune their boundaries using time-frequency-information-sensitive pooling methods.This method can process audio signals with any size of length and can directly output the location and classes of audio events.It has achieved fifth place in the IEEE DCASE Challenge 2017 Task 2 competition.(2)A model for boundary detection and pattern classification was constructed.Aiming at the unsatisfactory effect of short audio event detection in the method proposed in part(1),a first-phase detection model was constructed using convolution and recurrent neural networks: Firstly,feature extraction is performed on LogMel handcraft feature based on the convolutional neural network.After the feature extracting by the convolutional neural network,each frame audio signal is classified to complete the modeling of rare sound event detection through the recurrent neural network.In this model,feature extraction unit is constructed: one-dimensional convolution is used to extract features in the frequency domain and time domain,and the information extracted from the frequency and time domain dimensions is fused finally.At last,the missed detection and false detection problems caused by difficult detected audio frames are mitigated by the loss function.
Keywords/Search Tags:Rare Sound Events Detection, Convolutional Neural Network, Region Proposal Networks, Region-based Fully Convolutional Networks, Recurrent Neural Network, Loss Function
PDF Full Text Request
Related items