Font Size: a A A

Research On Environmental Sound Recognition Technology Based On Feature Fusion And Soft Attention Mechanism

Posted on:2022-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z X JiangFull Text:PDF
GTID:2518306779494854Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In the field of audio recognition research,environmental sound recognition refers to the computer learning and analyzing a short audio signal by simulating the auditory function of the human ear,and then assigning corresponding category labels.In addition,the ambient sound itself can convey a lot of important information,and by analyzing these sounds,it can help people monitor the conditions in the environment,thus helping to analyze the acoustic scene.At present,the application needs of environmental sound recognition and classification research are particularly prominent in the fields of medical care,safety monitoring,and prediction of ecological environment changes.Since speech signals are highly structured and well-defined,while environmental sound signals have no common structure and are susceptible to interference from other noises,the models developed for speech recognition are not suitable for ambient sound classification tasks.With the continuous development of artificial intelligence technology,deep learning technology is used for environmental sound classification.The neural network is trained by selecting appropriate acoustic features to solve the task of ambient sound recognition and classification.Although there is a large amount of research in this field,many challenges still exist.In the existing methods,on the one hand,most of them use a single speech feature as the input of the model,and these features are represented frame by frame,ignoring the timing of ambient sound features,which leads to a certain loss of ambient sound information.On the other hand,it is less considered from the perspective of the model,and it is difficult for a simple neural network model to completely extract the global deep features in the environmental sound features.Therefore,this thesis proposes the following solutions for these two aspects.(1)Aiming at the difficulty that a single environmental sound feature can fully reflect the environmental sound features,this thesis proposes a method of environmental sound recognition and classification based on multi-feature fusion.The model framework extracts the time domain features and frequency domain features of the ambient sound respectively.Through comparative experiments,the time-domain features and frequency-domain features with better identification results under the same classifier are selected and fused.Finally,we compare and analyze it with different classifiers,and discuss the fusion effect according to the classification evaluation criteria.(2)In order to improve the global extraction ability of the model for environmental sound features,this thesis analyzes and compares different convolution methods.On this basis,different attention mechanisms are used to conduct attention experiments on the features extracted from different convolutional layers,and the obtained experimental results are comprehensively analyzed.Finally,we compare it with the mainstream algorithms and get the scheme of this thesis.The experimental results show that the fusion of time domain features,frequency domain features and time-frequency features has better classification effect than single feature and dual feature.On this basis,the effect achieved by using convolution is better than other convolution methods.And through the research on the attention mechanism,we found that embedding the soft attention mechanism in the first layer of the convolutional neural network can better pay attention to the deep information in the fusion features.
Keywords/Search Tags:Environmental Sound Recognition, Voice Feature, Feature Fusion, Attention Mechanism, Convolutional Neural Network
PDF Full Text Request
Related items