Font Size: a A A

Audio Scene Classification Based On Deep Learning

Posted on:2022-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y PuFull Text:PDF
GTID:2518306557470304Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Audio scene classification is the recognition and classification of audio data,that is to identify and decide the environment label of recorded sound.It can be used in audio monitoring,detection of abnormal event,prevention and control of risk,and other security monitoring systems.With more and more recorded audio data,traditional classification methods show disadvantages in the face of a large amount of data.At this time,deep learning technology has been widely proved to have certain advantages in the use of data features and the establishment of pattern recognition.Hence,in this thesis,convolution neural network is used as the main model to improve the audio scene classification system from two aspects of data set and network structure,when the data set information is not introduced more data,and the network structure is adjusted without increasing the amount of calculation,which effectively improves the recognition accuracy of some scenes.The main work of this thesis includes:(1)The data set is labeled again.A good dataset is very important to improve the accuracy of classification.Although in most public datasets,each data only has a simple label,the introduction documents of datasets often provide some other auxiliary information.In this thesis,in addition to the classification training according to the original label of the data set,we get the secondary classification results through the analysis of the data set and re-label the loose class,then map the secondary classification results,and finally fuse the two classification results.When the primary classification results are known,it can be found that the accuracy of multiple categories is often not evenly distributed,hence,in this thesis we aim at the class with the lowest accuracy in the primary classification results,find the class that is most likely to be misjudged as this class,classify them separately by re-labeling,and fuse the secondary classification results after category mapping.Experimental results show that the accuracy of some audio scene categories can be improved significantly by the method.(2)For the neural network model,Res Net unit is introduced into the structure.With the increase of the depth of the network,the training time consumption of the traditional neural network increases,and the effect of the network will improve to saturation,and then deteriorate rapidly.Hence,in this thesis we add the appropriate Res Net unit in the basic network,and increase the efficiency of information transmission by adding direct connection edge to the convolution layer.The final result of convolution neural network based on residual unit achieves a higher accuracy,which is also improved compared with the accuracy of convolution neural network alone,and the training time is significantly shortened in the case of the same amount of data.
Keywords/Search Tags:audio classification, spectrogram, Mel filter, convolutional neural network, residual network
PDF Full Text Request
Related items