Font Size: a A A

Weak Label Urban Vehicle Sounds Recognition And Detection Based On CRNN Model

Posted on:2020-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:P QiuFull Text:PDF
GTID:2392330602450351Subject:Engineering
Abstract/Summary:PDF Full Text Request
In order to implement some key technologies such as intelligent city and intelligent transportation,the recognition and detection of the sound events which are emitted by vehicles in the city is becoming increasingly significant.At present,most of the researches on sound recognition and detection are based on the datasets with strong labels,but it is difficult to obtain strong label audio data.Therefore,we need to study how to recognize and detect urban traffic vehicle sound events via the datasets with weak labels.In this thesis,a weakly labeled dataset containing 17 kinds of urban traffic vehicle sounds is selected from Audio Set database for research.The distribution of samples of each kind of sounds in the dataset is unbalanced,and the audio samples in the dataset have the problems of weak label and multi-source.This thesis focuses on these three problems,and studies how to improve the accuracy of urban traffic vehicle sounds recognition and detection under weakly labeled dataset.The traditional sounds recognition and detection algorithms are no longer applicable to this dataset due to the weak labels in audio data.In this thesis,the time-frequency planes referring to the log Mel frequency spectrum coefficient(MFSC)of sound events are used as the feature,and a convolutional recurrent neural network(CRNN)is constructed as the baseline model of the sound recognition and detection system.The model which is composed of convolutional neural network(CNN)and recurrent neural network(RNN)can make full use of the sound features and it is suitable for weakly labeled sound recognition and detection tasks.Aiming at the problem of unbalanced distribution of each kind of sounds samples in dataset,we use a method that can select training batch data proportionally during the training period.This method can make the model fully learn the features of each kind of sounds,which alleviates the biased problem of model training,and greatly improves the accuracy of model for urban vehicle sounds recognition and detection.For the multi-source problem of the sound events,this thesis uses the importance weighted recognition method and a multi-scale attention fusion method.The importance-weighted recognition method weights and fuses the sounds detection results given by the model according to the importance degree of them.This method can make more use of the detection results of the frames in which the effective sounds are located,and ignore the detection results of the noise.Multi-scale attention fusion adds the attention gating mechanism and multi-scale convolution fusion to the CNN part of the model.What's more,the attention gating mechanism can control model to learn the important sounds features and ignore the unimportant sounds features,so that the model can pay more attention to the features of effective sounds.The multi-scale convolution fusion can obtain and fuse multi-dimensional sound features from the model,which enriches the features themselves.Importance weighted recognition method and multi-scale attention fusion method can improve the accuracy of model for urban vehicle sounds recognition and detection.In order to further improve the performance of the model,a multi sliding window framing method is used in the RNN part of the CRNN model.In this method,the output of CNN is segmented by setting different sliding windows,and then put these segmented features into a certain number of RNNs for recognition and detection respectively.Finally,we can get result by fusing RNNs outputs.This method takes the features of each sound in different frame length dimensions into account fully,both making the model learn more abundant features,and effectively improving the accuracy of the model for sound recognition and detection.Finally,two multi-model fusion methods are used in this thesis for the model fusion.This fusion method can also greatly improve the accuracy of the model for sound recognition and detection.Through simulation experiments,we can find that the CRNN baseline model used in this thesis is more accurate than the traditional sound recognition detection model under the weakly labeled urban traffic vehicle sound datasets.And the methods involved in the thesis,such as the importance weighted recognition method,proportional selection of training batch data method,multi-scale attention fusion method,multi sliding window framing method and multi-model fusion,can improve the accuracy of the model for sound recognition and detection.The fusion model used in this thesis has a F1 value of 57.5% for the test set's sound recognition result,the ER value of detection result is 0.627 and the F1 value is 45.1%.
Keywords/Search Tags:CRNN, MFSC, attention, model fusion, weak label, sound recognition, sound detection
PDF Full Text Request
Related items