Font Size: a A A

Research On Environmental Sound Event Recognition Based On Deep Learning

Posted on:2021-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:F Z LiFull Text:PDF
GTID:2480306554965759Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Sound is one of the main ways for humans to perceive the surrounding environment.The omnidirectional sound propagation and the advantages of not being affected by light and angle enable humans to use this way to effectively perceive and make decisions.The purpose of environmental sound event recognition technology is to explore a method for machines to automatically perceive the surrounding acoustic environment and make corresponding decisions.Since the 1970 s and 1980 s,researchers in related fields have begun to study environmental sound perception and recognition and have achieved certain results.In recent years,international competitions for sound event recognition and detection,such as CLEAR and DCASE competitions have also been launched,thereby promoting the rapid development of environmental sound event recognition.Although there have been a lot of research results in this field,there are still many problems that have not been solved well.For example,(i)insufficient research on feature fusion,for example,the early fusion-based method is not suitable to the extraction of high-order features for convolutional neural network;(ii)Because the research in this field mainly focuses on theoretical research,such as acoustic features and classification algorithms,and research related to engineering applications is insufficient,such as the detection of sound data streams,the expansion of environmental sound data sets,and the effect of sounds of different sampling frequencies on recognition.The following solutions are proposed for the above problems.(i)In the study of feature fusion,the feature fusion method of pre-fusion is not conducive to the extraction of higher-order features by convolutional neural networks,and a feature fusion framework based on Two-input convolutional neural networks is proposed.The feature fusion framework extracts two high-level features through different convolution and pooling strategies,and then stitches the high-level features into the output layer to output the classification results.This approach not only matches the appropriate convolution and pooling strategies for different features,but also avoids the stitching of features with different units or scales that interferes with the ability to extract higher-order features of the convolution kernel.The evaluation results of the public data set show that the feature fusion method proposed in this paper is better than the single feature and the existing fusion method.Moreover,this framework is applied to the detection and recognition of car horn sounds in actual scenarios.The results show that the Recall reaches 87.7%,the Precision reaches84.7%,and the F1 measurement reaches 86.2%,which is superior to other methods.So,the proposed method can be applied in actual scenarios.(ii)In engineering application research,combining theory with practice,deploying environmental sound event recognition algorithms into actual scenes,and put forward reasonable solutions to some difficulties encountered,such as the detection of continuous long-segment sound data,an environmental sound event detection method based on sliding detection is proposed.The detection method includes sound data preprocessing,acoustic feature extraction,classifier modeling and recognition,and classification result sorting.The system is suitable for detecting environmental sound events existing in continuous longsegment sound data.After actual scene testing,the system can be embedded with different environmental sound recognition methods and has good detection performance.Aiming at the problem of insufficient amount of environmental sound data used for training,a large amount of environmental sound data was collected using environmental sound collection equipment developed,and the data set of environmental sound was augmented using data enhancement technology to improve the generalization of environmental sound recognition methods.Aiming at the problem that high-fidelity sound data causes high network transmission pressure,through theoretical analysis and experimental research,an appropriate sound sampling frequency is determined,which greatly reduces the network transmission pressure.In the end,the environmental acoustic event recognition algorithm was landed on the campus scene,which effectively improved the security capabilities of the campus.
Keywords/Search Tags:feature fusion, convolutional neural network, environmental sound event recognition, the system of environmental sound event detection
PDF Full Text Request
Related items