Font Size: a A A

Research On Feature Extraction Of Audio Event Detection Based On Spectrogram

Posted on:2018-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2348330518995452Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the development of artificial intelligence, audio monitoring technology has become a new research hot spot. The audio signal has only one dimension, so its storage quantity is relatively small and has higher computation efficiency which make up the high cost, high complexity and blind spot of the video surveillance system. Therefore,audio monitoring technology has a wide range of applications, it is worth following up. Abnormal key audio event detection is an important branch of audio monitoring system. There are some technical difficulties, the most critical problem is that it is difficult to sum up the general characteristics of audio events. In this paper, the audio events include footstep, glass-breaking, gunshot and scream. The study found that the spectrograms of these four audios are obviously different, with good discrimination. In this paper, we focus on the difficulty in feature extraction of key audio event detection. Based on audio spectrogram, we propose three feature extraction algorithms from two perspectives:1. Feature extraction based on directional selectivity of spectrogram(1) Gabor transform methodMulti-scale and multi-oriented Gabor filter bank can effectively extract information from spectrogram. In this paper, four directions and four frequency scales are selected, so we obtain a total of 16 Gabor filters.In order to reduce redundancy, we propose three different partial filter selection methods to extract the partial Gabor feature. The experimental results show that these Gabor features are superior to the traditional MFCC-GMM algorithm, among which the G (4 × 4) feature performs best, its average detection rate of clean audio is 96.1%, the recognition rate of footstep, gunshot and scream can reach 100%, and it also has stronger noise robustness under different SNRs.(2) Projection methodAlthough Gabor feature has good recognition rate, its feature extraction time is longer and its feature dimension is higher. This projection feature is based on the Gabor feature, projecting the spectrogram to four directions, then extracting the 62-dimensional statistical characteristics. The analysis shows that this projection can reflect the dynamic changes of spectrogram, but it is relatively unstable.Its average recognition rate to clean audio is 89.2%, which is slightly better than the traditional MFCC-GMM method, but is much worse than the Gabor feature, and it is more susceptible to strong noise.2. Feature extraction based on energy distribution of spectrogramThe algorithm divides the monochromatic graphs of the spectrogram into N * N local blocks, and then extracts the central moment parameters of each block. This feature characterizes the energy distribution of each time-frequency local region of the spectrogram, and can represent different audio categories. Its average recognition rate is 93.7% for the pure audio, and 100% for the glass-breaking class. It is a good complement to the G (4 x 4) feature. After combining this two features,its average recognition rate to the clean audio is 96.8%, and can reach 86.1% under different SNRs. So the combination of G (4 x 4) feature and the energy distribution feature has the best performance.
Keywords/Search Tags:audio event detection, spectrogram, Gabor filter, projection, center moment
PDF Full Text Request
Related items