Font Size: a A A

Research On Sound Event Recognition Algorithm Based On Deep Learning

Posted on:2022-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2518306758969669Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As one of the most important research areas in non-speech audio classification tasks,sound event recognition is widely used in audio monitoring,audio scene analysis,bioacoustic monitoring,medical diagnosis and other fields.As the main way of information dissemination,sound guides human life and production by analyzing the information carried in sound,and improves the efficiency of life and production.The design of traditional feature extractors requires researchers to have a lot of prior knowledge and perform complex calculations;using traditional artificially designed network models to model sound,its accuracy is difficult to achieve satisfactory results.In this thesis,we will use a deep learning approach to solve the task of sound event recognition.Aiming at the problem of insufficient information carried due to the single dimension of the input feature map in the sound event recognition technology,this thesis designs a multichannel and multi-resolution feature.Firstly,logarithmic Mel features,gamma pass filter cepstral coefficients,constant Q transform,and chrominance features are selected to form four-channel features,and information complementarity can be achieved between different features.Since different sounds have different sensitivities to time scales,usually discriminative features will exist in different time scales.This thesis designs multi-resolution features of sound signals.It can realize information complementation between different feature resolutions and enhance the expressive ability of features.Aiming at the low accuracy of the sound event recognition model,this thesis proposes a timefrequency attention module.The introduction of the previously designed multi-channel and multi-resolution features will inevitably lead to information redundancy and background noise.The time-frequency attention module first uses strip convolutions of different sizes to focus on the effective information in the time domain and frequency domain,and then uses two Dimensional convolution fuses the two,thereby suppressing background noise in ambient sound and eliminating redundant information interference caused by multi-channel and multiresolution.Aiming at the problem of insufficient samples for the sound event recognition task,an audio database Audio-7 was constructed through the investigation of the actual environment.The data set mainly includes 7 types of sounds,with a total of 1050 samples.In order to ensure the independence of the samples,each sample is are from different audio clips.In addition,data augmentation is used: time stretching,pitch shifting,adding random Gaussian noise and other techniques to further alleviate a series of problems caused by insufficient data sets.Finally,experiments were carried out on ESC-10,ESC-50 and the self-built dataset Audio-7,and the effectiveness of multi-channel and multi-frequency features,time-frequency attention module and data augmentation was verified by ablation experiments.When the Mel feature is used without data enhancement,the accuracies on the three datasets are 89.32%,82.76%,and 85.00%,respectively;with the proposed time-frequency attention module and multi-channel multi-frequency features and data enhancement,the final result can be achieved 98.50%,88.46%,92.50%,an increase of 9.18%,5.70%,10.50% respectively.Such a significant improvement can already confirm the positive effect of each module.
Keywords/Search Tags:Sound event recognition, Deep learning, Bar convolution, Multi-channel and Multiresolution, Time-Frequency attention module
PDF Full Text Request
Related items