As the most common kind of information carrier,the sound carries rich category information.And because the acoustic sensor has the advantages of good concealment,low price and low electromagnetic interference,the acoustic target recognition has great significance for safety supervision,military reconnaissance,ecological monitoring and smart home.The acoustic target recognition precess is divided into two parts: feature extraction and classification.Nowadays,the main artificial feature extraction method has great dependence on personal experience,and the extracted features have insufficient ability to represent the category attributes of the acoustic target.In addition,the classification based on traditional machine learning algorithm is difficult to model complex acoustic targets and cannot solve complex classification problems.As a multi-level intelligent sensing algorithm,deep learning can fully exploit the attribute categories and deep features of the target.This thesis proposed to apply the deep learning method to acoustic target recognition,and to develop the research on sound feature extraction and classifier design,in order to provide a new method for the recognition of acoustic targets.In this thesis,the non-speech sound targets such as footsteps,thunder,bells and airplane sounds,which are common in life,are the main research objects.Audio files are used as data set in this thesis.Based on the basic principles of audio recognition,an overall scheme for acoustic target recognition based on deep learning is designed in this thesis.Firstly,the logarithmic Mel feature extraction method is designed from two aspects:Mel filter design and discrete cosine transform.The method takes the logarithm of the output of the acoustic target power spectrum after passing through a set of high-order contour mer filter banks.On the one hand,the characteristics implied the nonlinear auditory characteristics of human beings,on the other hand,the low frequency part is strengthened.Then a multi-channel feature extraction method is designed.The sound signals are intercepted by windows with lengths of 512,1024 and 2048 samples,and the multi-channel acoustic features with more complete information are obtained.Based on this,a convolutional neural network model and residual network model for deep feature extraction are designed.Finally,classification models combining global average pooling and deep neural network are designed.The performance of the classifiers composed of a global average pooling layer and fully connected neural network with different structures is analyzed.The global average pooled classifier can reduce training parameters while ensuring recognition accuracy.In this thesis,the data set is divided into three parts: training set,test set and verification set according to the ratio of 80%,10% and 10%.Using the logarithmic Mel characteristics as the input feature,the multi-channel deep feature extraction model based on different window lengths is designed as the final feature extraction model,the global average pooling layer and the two-layer deep depth neural network as the classifier,the acoustic target recognition system obtained 90% accuracy on the test set for ten kinds of random sound targets,and obtained 87.16% recognition accuracy rate,85.00% recall rate and 84.85% F1 value on the verification set.The results shows that the proposed method has good acoustic target recognition performance. |