Font Size: a A A

Research On Audio Event Classification Based On Deep Learning

Posted on:2021-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:H X LiFull Text:PDF
GTID:2428330632462949Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Audio event classification is a hot topic in audio research at present,and it is widely used in the present.It refers to identifying the type of event in an audio stream.In this research,there are some difficulties that have not been solved well,such as the diversity and randomness of audio.The starting point of this research is based on the characteristics of these audios,combined with the current popular deep learning methods,classification model of audio events is designed.The theoretical conjecture proposed is verified through experiments.The main work of the thesis includes the following two aspects:1.Propose related algorithms of audio attribute space.Aiming at the problem that the human ear distinguishes audio events at high and low frequencies,the wavelet transform method has multiple resolutions,and the wavelet transform method is used to extract audio features.Continuous wavelet transform CWT and discrete wavelet are designed.Two ways to transform DWT audio feature extraction method.The audio classification network combining the residual network and the LSTM network is studied,and the network structure of the residual network combined with BiLSTM is improved and designed.Aiming at the problem of the diversity of audio events in the time domain and frequency domain,the multi-scale network structure MseCNN based on the classification of audio events is designed by using the basis of the multi-scale idea of convolution kernel and the multi-scale convolution kernel structure of Inception.Among them,the continuous wavelet transform feature CWT-9 and MseCNN network model,the overall accuracy of the two data sets Urbansound and ESC-10 reached 84.3%and 93.1%respectively.2.Study audio classification methods based on attention mechanism.Considering the diversity of the distribution of audio events in the time-frequency space,the audio classification method based on the attention mechanism is studied based on the effect of the attention mechanism of the human ear when perceiving sound.From three angles,the attention mechanism on audio classification is conceived and experimentally designed.Aiming at the randomness of the distribution of audio events in the time domain,a time attention method is designed to highlight the information of key frames in time.Aiming at the diversity of the distribution of audio events in the time-frequency space,a time-frequency space attention method is designed to highlight the key information in the time-frequency domain space.Aiming at the diversity of feature components of multi-channel networks when extracting audio features,a channel attention method is designed to highlight important feature components and improve the differentiation of features in high-level spaces.In the experiment,three sets of comparative experiments were performed on the baseline network.Attention modules were added at different locations on the network to analyze and verify the previous assumptions.Three combined experiments of attention were carried out,and finally the best attention structure was combined with the network model in Chapter 3.The accuracy of the combined experiments of spatial domain and channel domain on the Urbansound and ESC-10 datasets 85.7% and 94.9%.
Keywords/Search Tags:Audio event classification, wavelet transform, multi-scale network, attention mechanism
PDF Full Text Request
Related items