Font Size: a A A

Research On Acoustic Scene Detection Based On Deep Learning

Posted on:2020-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:R H ZhaoFull Text:PDF
GTID:2428330578458249Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Voice carries a lot of information about human daily environment and its occurrence.Human beings can feel their own sound scene(seaside,street,etc.)and identify various sound sources(wave,car,etc.).Audio signal automatic detection technology has broad application prospects in content search of audio files and context perception of mobile devices,so a series of studies have been carried out on this technology.However,due to the superposition of multiple sound sources or the interference of environmental noise,the reliability of automatic sound detection technology based on machine learning is poor,so human beings still need a lot of research in the field of machine learning to accurately identify a single sound source and sound scene in the real scene.Audio scene recognition refers to the recognition of audio content(tags)from streaming media or audio recordings by human or artificial systems.Traditional voice recognition is usually depended on digital signal processing or simple classifiers.Now,with the popularization of in-depth learning,traditional recognition methods need to be improved to satisfy future application.The main research of this paper is to use in-depth learning methods to recognize sound scene.The main process is to use the improved deep convolution neural network to form a multi-feature weak learner group,then use the ensemble learning method to train a strong learner and construct a multi-spectrogram ensemble learning system for sound field recognition.Because in-depth learning has certain requirements for the amount of data,to solve the problem of insufficient audio source files,this paper will try to expand audio files.Specifically,the audio data are expanded by WaveGAN and Mixup,which are the extension networks of the generative countermeasure neural network.After comparing the experimental results,the audio data expanded by Mixup method is selected as the source data of the following steps.Some simple machine learning methods have limited ability to deal with complex audio signals and model representation,which results in that the performance of the sound scene detection model cannot meet the requirements of practical application.Therefore,the author adapts the deep convolution neural network to transform the audio data learning,and the accuracy of transformation result is higher than the baseline system and the ordinary convolution network.The modified model will be applied in the following steps.To improve the accuracy of sound scene recognition,the innovation of this paper is to adopt the ensemble learning method of multi-spectrogram features.Because different audio features have different sensitivity to different scenarios,this paper will input multiple audio features for multiple weak learners.Firstly,the data set is sampled randomly,and the feature is extracted.Then these features are input into the weak learner for training.Secondly,the author uses random forest method which is one of ensemble learning methods to classify sound scenes(CART decision tree is a weak learner).The evaluation results have obvious advantages compared with baseline system and individual VGGNet neural network,Then Stacking method(neural network as strong learner)is used to input the output of these weak learners(VGGNet)for the next-level strong learner(multi-layer perceptron model),and then the audio label is obtained from the output of the strong learner.After training,the evaluation data set is used to evaluate in the whole system.The result is more accurate than the random forest method which is one of ensemble learning methods.
Keywords/Search Tags:Acoustic Scene Classification, Deep Learning, Feature Extraction, Data Extension, Ensemble Learning
PDF Full Text Request
Related items