Font Size: a A A

Deep Learning-Based And Transfer Learning-Based Enviorment Sound Recognition

Posted on:2017-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ShiFull Text:PDF
GTID:2308330503987200Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Environment Sound Recognition(ESR) is an efficient way to perceive surrounding scenes, which is widely used in many application scenarios like robotic navigation, mobile robots, audio retrieval, audio forensics and other wearable, context-aware applications. Most classical Classifiers have been used in ESR problems,but the performance is far away from the requirement. Deep learning is used in ESR in order to further improve the classification performance, meanwhile, it is a multi-layer, high-performance classifier and proved efficiently in extracting features and modeling. Audio feature play an important role in ESR as it has many advantages such as no angle requirement during collecting and containing information of environment. Considering the associated video which can be gathered easily during the collection of audio work with audio can be helpful to imp rove the performance, we make full use of audio and video information and use them to model the Deep Neural Network-based ESR method. Furthermore, new environmental data exist in real life and they bring new classification requirement, but retraining and l abeling the new data will always highly cost, so we model the new data by transfer learning.In this paper, deep learning is used in ESR problem, and for sufficient using associated video and audio information, two kinds of feature fusion methods are proposed which include feature-based fusion method and model-based fusion method. Feature-based fusion method combines audio and video features and then recognizes environment sounds with a Deep Belief Network(DB N) model. Model-based fusion method connects two elaborate DBN models which are trained separately from audio features and video features to their best performances, in which, to do the connection, a new DBN is adopted and trained to replace the output layers of the two existing D BNs. Experimental results show that a better performance is achieved when using the model-based fusion method in a DBN model.Transfer learning strategy is used in ESR problem for reducing the cost of labeling and retraining. The unsupervised training of Stacked Denoised Autoencoder(SDA) makes it easier for combining with transfer learning. We model the new data by balancing the relationship between the new data and the existing data. Experimental results show that transfer learning can classify the target data accurately and significantly improve the performance. Furthermore, the study of the training process of DBN is based on transfer learning and a Unifrom Pretraining(UP) method which expand the data for pretraining is proposed. It makes the target data classify only by fine-tuning the existing UP modal. Experimental results show that the uniform-pretrain strategy can improve the performance and make the system more stable and robustness.
Keywords/Search Tags:ESR, feature fusion, deep learning, transfer learning
PDF Full Text Request
Related items