Font Size: a A A

Research On Acoustic Target Recognition Methods Based On Deep Learning

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:M Z LiFull Text:PDF
GTID:2428330602974598Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Acoustic target recognition is a hot research problem in the field of sound signal processing.It is committed to analyzing the complex features contained in sound signals and identifying the semantic information contained in them,so as to finally realize the recognition of acoustic targets.As one of the main carriers of information,sound can be widely used in the fields of security supervision,medical monitoring,ecosystem investigation,and anti-terrorism.Aiming at the problems that it is difficult to effectively characterize multiple sounds in complex environments and the recognition of acoustic targets in noisy environments is easy to be misjudged and susceptible to noise changes,three commonly used environmental sound classification benchmark databases,ESC10,ESC50,and Urban Sound8 K,and self-built As the research object,the database AUDIO-5 has been studied in depth.The main research contents and innovative results of this article are as follows:(1)A lot of reading and analysis of the current research status of acoustic target recognition at home and abroad,and found that deep learning-based acoustic target recognition methods often have better recognition performance and better generalization ability for multi-class sounds.In addition,the existing acoustic signal feature extraction methods have greater limitations on the feature expression of sound.Through research on effective acoustic signal feature expression and deep learning methods,find more optimal feature extraction methods and structured network models and apply them to acoustic target recognition in practice.(2)Aiming at the problem of large background noise differences in different scenarios in the real environment,noise detection and the effective characteristics of sound targets often confuse the endpoint detection with a fixed threshold,leading to low detection accuracy.An adaptive single parameter dual Threshold endpoint detection method.This method can effectively clip background audio clips and avoid more background information interference.(3)It is learned through analysis that the optimal frequency resolution of different sounds is not exactly the same.However,the existing audio feature extraction methods extract feature maps with a single frequency resolution,such as log Mel features,cochlear maps,etc.Features and constant Q transform features,which make the expression of multiple sound features in complex environments have greater limitations.In view of this problem,a novel feature extraction method capable of taking into account multiple frequency resolutions is proposed,thereby extracting multi-frequency resolution features for more comprehensive feature expression of acoustic signals.This feature not only has theeffect of data enhancement,It can also extract more related information in the time-frequency dimension.The experimental results show that,compared with the existing single frequency resolution feature extraction methods,compared with existing features the proposed multi-frequency resolution features have improved recognition accuracy on the three reference databases of ESC10,ESC50 and Urban Sound8 K by 1.9%,2.3% and 1.7%,respectively.(4)The most obvious difference between the acoustic signal feature map and the natural image is that the background information in the acoustic signal feature map is often more than the foreground information,which causes the image to contain too much useless information and affects the acquisition of valid information.Aiming at this problem,this paper uses a convolutional neural network to design an efficient network model with spatial attention,which reduces the proportion of background information extracted by the network layer by layer,and focuses more attention on the foreground area,thereby reducing background noise interference.The model can also fuse three multi-frequency resolution features with feature channels to achieve a more comprehensive feature expression of the acoustic signal.Experimental results show that the proposed method achieves higher accuracy on ESC10,ESC50 and Urban Sound8 K,which are 97.5%,93.1% and 95.3%,respectively.For ESC10,the artificial accuracy on this data set is 95.7%,which is never achieved by previous methods.However,our proposed method is 1.8% higher than manual accuracy and 3.3% higher than the current latest method.On ESC50 and Urban Sound8 K,the method in this paper is improved by 0.5% and 2.3% respectively compared with the latest method.(5)Aiming at the problem that the recognition of acoustic targets in noisy environment is easy to be misjudged and easily affected by noise changes,an environment adaptive acoustic target recognition system is designed.Active calibration of acoustic target signals is used to obtain the characteristic information of acoustic targets under the influence of real environmental noise to adaptively optimize the model.The software and hardware design of the system is realized,and all functions are presented as a graphical interface interface.Self-built database AUDIO-5 is used to verify the acoustic target recognition system under the real environment.Experimental results show that the system has high stability and environmental adaptability.
Keywords/Search Tags:Acoustic target recognition, Environmental sound classification, Multi-frequency resolution, Convolutional neural network, Spatial attention
PDF Full Text Request
Related items