Font Size: a A A

Research On Sound Event Recognition Based On Deep Learning

Posted on:2019-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2428330596460562Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As an important information carrier,sound is often utilized for pattern recognition.Because sound is easy to collect and not limited by the angle and the light,it is used to assist the perception of the environment,information collection,and making descision.Sound event recognition technology is a typical application of voice recognition.As a research field with broad application prospects,the recognition of sound events has attracted much attention of the researchers.By receiving and processing the audio signals of the environment,the sound event recognition technology can detect the objects in the environment and the occurrence of sound events such as bird sounds,gunshots and knocks,and can quickly sense changes of the environment,such as footsteps from far to nearby.Therefore,the sound event recognition technology has been adopted in many fields such as security monitoring,audio content retrieval,medical monitoring and robot intellisense,and has provided new human-machine interaction methods and smart machine hearing systems.Since the sound event recognition technology has been developed since the early 1980 s,a variety of feature extraction methods and classification algorithms have emerged in an endless stream,and considerable development has been achieved.Since 2006,international competitions in the field of sound event recognition,such as CLEAR and DCASE,have been in full swing,further promoting the development of sound event recognition technology.In addition,the emergence of deep learning in recent years has provided a breakthrough for sound event recognition technology.Deep learning takes the deep neural network as the main framework,from which various types of networks with different structures are derived,such as convolutional neural networks,recurrent neural networks,and deep belief networks,etc.These networks have different functions and are applied in various fields,and achieved great success.Therefore,this paper focuses on leveraging a variety of deep learning models to promote sound event recognition technology.Through the use of various types of deep neural networks with different structures,the sound event samples are extracted and characterized to further improve the recognition of sound event recognition systems.Accuracy.The research mainly includes the following aspects:Firstly,the method of sound event recognition based on recurrent neural network is studied.By analyzing the defects of the recurrent neural network—the gradient vanish and the gradient explosion problem—the improved recurrent neural networks used to solve such problems are introduced,such as LSTM and GRU.Based on the characteristics of sound event recognition,the samples of sound events are processed in frames,and the MFCCs are extracted as features.Then,a sound event recognition system based on the recurrent neural network is constructed.Experiments are conducted on comparison with a variety of traditional pattern recognition classification algorithms.The results show that the GRU-based sound event recognition model effectively takes advantage of the natural temporal attributes of sound events and solves longterm dependency problems through the recurrent network layers,so as to improve the accuracy of sound event recognition.Secondly,an improved multi-scale convolutional neural network for sound event recognition is proposed.Traditional convolutional neural networks have a problem that lack of low-level features.The proposed model includes more feature layers to reserve the information of low-level features and high-level features of the input.In order to extract the features of different level effectively,convolutions are leveraged to compress the feature maps of all convolutional layers except the top convolutional layer,then the condensed features are concatenated,which contains all features in different levels and enhances feature learning.Multi-channels spectrogram features comprised of mel-spectrogram and its deltas are adopted,which are able to extract the dynamic features of the sound clip better.In addition,point of FFT,number of mel-bands,and type of mel-spectrogram deltas—the critical parameters of melspectrogram,are discussed and reasonable choices are suggested in practice.Experiments results on datasets ESC-10,ESC-50,and TUT show that,compared with some state-of-art results on standard benchmark,the proposed method yields improvements of recognition accuracy in various degrees.Finally,a data enhancement method for sound event recognition based on deep convolution generative adversarial networks is proposed.Firstly,various SIF features are extracted from the datasets,and a DCGAN network is built to learn such SIF features so as to generate similar pseudo-samples in batches.Afterwards,the SVM hyperplane distance and the discriminant probabilities in the CNN of the pseudo-samples are used to select good quality pseudo-samples for data enhancement to improve the performance of the sound event recognition model.In addition,adversarial training is adopted,that some pseudo-samples with poor quality and fuzzy ambiguity on the category are added into dataset to improve the stability and robustness of the model.Spectrograms,multi-channel melfrograms,and GBVS saliency maps are used in the experiments.Multiple feature comparison experiments and model comparison experiments on the ESC and TUT datasets demonstrate that this method can effectively improve the performance of sound event classification model.
Keywords/Search Tags:sound event recognition, deep learning, convolutional neural networks, recurrent neural networks, spectrogram image feature, generative adversarial networks
PDF Full Text Request
Related items