| Acoustic scene classification is mainly dedicated to perceiving and understanding the surrounding environment by recognizing the semantic tags contained in a specific scene through the analysis of the complex characteristics extracted from various audio signals,and finally,and achieves the classification of specific sound scenes.Acoustic scene classification(ASC)using computers is of great significance to Home Automation,Self-driving Car,Speech Recognition in complex scenes,and Audio Monitoring Systems.However,acoustic scenes often contain a large number of interference from non-stationary,wide-frequency,and erratic abnormal sound signal or the superposition of multiple sound sources,which increases the difficulty of the research on the field of acoustic scene classification.To deal with these problems,this paper proposes a Mix-up data enhancement method based on the sound pressure level and an acoustic scene classification method based on the attention mechanism and the multi-scale feature-fusion model.The main research contents of this paper are as follows:(1)Extract multi-channel Mel energy spectrum characteristics.This paper constructs a multi-channel Mel spectrum feature map by concatenating the Mel energy spectrum characteristics of the harmonic source,percussion source,and the multichannel fusion signal,and we use it as the input feature of the proposed model.(2)Audio data enhancement.This paper proposes a new method of data enhancement.Considering the square relationship between the sound energy and amplitude,and the insensitivity with low-frequency and high-frequency of human hearing,the A-weighting and Mix-up methods are used to mix the two sound characteristics to generate new sound features.(3)Establish a multi-scale feature-fusion module.Based on the VGG convolutional block,this paper firstly uses the multi-channel Mel energy spectrum features as the input of the model;Then,the multi-channel features are extracted by up-sampling and down-sampling followed by horizontal connection,and finally used as the input of the attention module.(4)Design a new attention mechanism module.First,we assign weights to multiscale fusion features to obtain a probability distribution map;Then we multiply the probability distribution map with the original feature map element by element to obtain the relevant probability feature map;Then we normalize the original feature map and add it to the probability feature map to obtain the output feature of the attention module;Finally,the feature is input into the Softmax classifier for classification.This paper conducts experiments on the DCASE2019 acoustic scene development data set and the LITIS Rouen acoustic scene data set.Experimental results show that the recognition method proposed in this paper is 13.1% higher than the baseline average level,and the recognition and classification performance is well. |