| In the last few years,the classification of environmental sound recognition has gradually become a research hotspot in the field of sound recognition,which is also a very widely used problem in real life.But most classification learning methods are single-label,that is,each object has only one label,which is not applicable to the complex and ambiguous real world.Therefore,multi-label classification problem is a hot issue that is closer to the real world and has great research value.At present,the methods used for audio multi-label classification task could be divided into four broad categories: problem conversion method,algorithm application method,ensemble method and deep learning method.In this research,a method based on deep learning technology for multi-label classification of environmental audio with noise is proposed.This method also meets the classification of single-label data sets and has universality.Firstly,distinguish noisy audio from clean audio in the data set s.Use the RNNoise denoising algorithm to denoise all the noisy audio in the data set by using the pretrained denoising model to obtain cleaner audio files.Then Vggish model is used to extract more and more complete audio features,and the obtained feature vectors are used as the input of the neural network model.Then use the Res Net34 network and Env Net-v2 network for model training.It should be noted that the feature vector s obtained in the previous step cannot be applied to Env Net-v2 network,because the network is directly based on the denoised audio waveform file.During the training process of the model,track the training values of each indicator.According to the results,update the parameters of the model in time until obtain a model with a higher accuracy.Finally,define the evaluation index and calculate the average accuracy,load the trained model obtained in the previous step to predict the label of the test data set,and output the predicted probability result.Compared with the traditional audio multi-label classification method,this research is the first time to add an independent noise reduct ion algorithm to the data preprocessing module.In addition,considering the difference between environmental audio and speech,the VGGish model is used to extract more complete feature information.Through a large number of experiments,the research value of the audio multi-label classification field has been obtained: the audio characteristics as the model input classifier,when the training using semi-supervised learning method can get better experimental results.In the classifier which takes audio waveform file as model input,adding noise reduction algorithm in the data preprocessing module can significantly improve the accuracy of label prediction.In this research,experimental verification was carried out on a multi-label dataset and a single-label dataset respectively.On the multi-label dataset,the accuracy rate of the training model using the semi-supervised learning method reache d 40.17%,and the accuracy rate of the model using audio waveform files as input reached 40.48%.Compared with the benchmark method,better experimental results are obtained in this study.On the single-label dataset,the accuracy of the training model usi ng the semi-supervised learning method reached 71.91%,and the accuracy of the model using the audio waveform file as the input reached 68.64%.Finally,through the analysis of the experimental results,the advantages and disadvantages of the current work are found out,and the improvement direction for further research in the future is given. |