Font Size: a A A

Research On Acoustic Scene Classification Method Based On Deep Learning And Feature Fusion

Posted on:2021-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y L HuangFull Text:PDF
GTID:2518306476950299Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Acoustic scene classification aims at extracting acoustic features by processing audio signals to distinguish the audio scene to which each piece of audio.It plays an important role in the fields of intelligent noise reduction,smart phones,and automatic driving.With the innovation of deep learning algorithms and the improvement of computing power of hardware devices in recent years,deep learning methods have been gradually applied to computer vision,natural language processing,speech recognition and other fields.Compared to traditional machine learning models,neural networks have good generalization performance and are suitable for processing massive amounts of data.A convolutional neural network in the field of deep learning is used to model the audio feature spectrum,and a feature fusion algorithm is designed in conjunction with the characteristics of the spectrum itself to improve the accuracy of the acoustic scene classification system.The main contents of this article are as follows:(1)The research significance and background of acoustic scene classification are briefly introduced,and the development history in this field and the research status at home and abroad are described,The authoritative competition and common data sets related to the task of acoustic scene classification are focused.(2)The overall framework of the acoustic scene classification system are described,and the original data division module,audio preprocessing module,acoustic feature extraction module,classifier training module,measured data test module,and new sample prediction in the framework are briefly introduced.The pre-emphasis,framing and windowing process of audio signal preprocessing are described,and the extraction process of common acoustic features such as short-time energy,short-time zero-crossing rate,short-time Fourier transform,and Mel frequency cepstrum coefficient are introduced in detail.The functional principles of traditional machine learning algorithms and neural networks such as GMM,HMM,SVM,and RF are outlined.(3)The functions of convolutional layer,pooling layer,activation function,residual connection and other structures in convolutional neural networks are introduced respectively.Two common feature fusion methods,element-wise addition and splicing,are briefly introduced.An acoustic scene classification algorithm based on feature fusion and high and low frequency separation convolution is designed: the log mel spectrum of audio and its first-order second-order difference spectrum are spliced in the channel dimension,and a deep residual network pair fusion is built to model Spectral.Before performing the convolution operation,the fusion spectrum is divided into two components,low frequency and high frequency,two independent residual neural networks are used for modeling,and the feature maps obtained by the two paths are fused before the system outputs.Finally,the classification is through a fully connected layer.In order to verify the effectiveness of spectral feature fusion and high and low frequency separation convolution,three sets of comparative experiments were designed on the TUT Urban Acoustic Scenes 2019 dataset.(4)Since many acoustic events are included in the audio used for acoustic scene classification,these acoustic events often overlap each other,and the overfitting phenomenon is likely to occur when the frequency spectrum of the audio are extracted and the CNN is used to model.In response to the above problems,an acoustic scene classification algorithm based on stratified time-frequency feature fusion is proposed: the median filter in image processing is applied to the log mel spectrum of audio,and the spectrum can be divided into three components according to the duration of the acoustic event.And the grouping convolution in the Alex Net model is used to model the three spectrum components respectively,and before the fully connected layer,the feature maps obtained by the grouping modeling are fused,and finally the acoustic scene classification is performed through the Softmax layer.In order to solve the problem of insufficient training data,the Mixup algorithm is used to enhance the data.Six experiments were conducted on the TUT2019 data set for comparison: with and without the advance of the Mixup data enhancement,the original log-mel feature,the extended HPSS layered feature,and the stratified time-frequency feature based on the duration of the acoustic event as input of acoustic scene classification system,the original log-mel features are modeled with ungrouped Alex Net,and the remaining two stratified features are modeled with grouped convolutional Alex Net.The effectiveness of the proposed acoustic scene classification algorithm based on stratified time-frequency feature fusion are verified by the experiment.
Keywords/Search Tags:Acoustic scene classification, Feature fusion, High and low frequency separation convolution, Stratified time-frequency features, Group convolution
PDF Full Text Request
Related items