Font Size: a A A

Research On Audio Scene Detection Method For Intelligent Mobile Terminal

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Y JiangFull Text:PDF
GTID:2428330602971879Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Scene recognition of intelligent mobile terminals is an important branch of intelligent machine research,and has a broad application prospect in terminal positioning and navigation,path planning and security monitoring.Audio based scene recognition can solve the problems of long computation cycle and poor dynamic performance of complex image processing,and is free from interference of light change.Intelligent mobile terminal is equipped with high sensitivity sensor,high-capacity memory and high-performance CPU,which can not only collect audio signals but also meet the needs of storing and processing audio signals.Therefore,it is of great significance to study the audio scene detection method based on intelligent mobile terminal.Firstly,an application program is designed for the intelligent mobile terminal,through which real-time audio signal acquisition,noise reduction and endpoint detection can be realized.Then,a new wavelet threshold function is proposed to improve the noise reduction effect.The new threshold function is continuous at the threshold point and an exponential contraction factor is introduced.The signal-to-noise ratio(SNR)is used as the evaluation criterion of the noise reduction effect.The experiment proves that the new threshold function can reduce the SNR of the real-time audio signal to 19.47 dB,which is 5dB higher than the traditional soft and hard threshold denoising method.The endpoint detection of the audio signal after noise reduction is carried out to judge the quality of the collected audio signal and remove the mute segment.The accuracy of endpoint marking was used as the evaluation criterion of endpoint detection.Experimental results show that the accuracy of endpoint detection based on improved spectral entropy is 90.03%.An acoustic event database(the data volume is about 10G)was built based on the audio signals provided by DCASE database.After eliminating the false labels and useless audio signals,the artificial construction,label indexing and other feature engineering were conducted.The database consists of the physical layer,the acoustic feature layer and the semantic layer.The physical layer stores the underlying transmission information,the acoustic feature layer stores the time-frequency domain,cepsteum domain feature coefficient and Mel energy spectrum,and the semantic layer stores the actual label and model label.In batch processing,all the information of the corresponding audio can be read directly by using the audio number,which can save the calculation cost and improve the efficiency of the classification model.Two identification models were adopted for online real-time audio scene detection.Firstly,the fused audio feature parameters were input into a random forest for identification.Secondly,the Mel energy spectrum was input into the convolutional neural network for recognition,and the influence of two different feature methods on the audio scene recognition results were compared and analyzed.According to the experimental results,the recognition rate of the random forest method with traditional feature coefficient is about 77%,while that of the convolution neural network based on Mel energy spectrum is about 68%.Because the fusion of the characteristic coefficients of the time-frequency and cepstrum domain can better represent the dynamic transformation of the audio signal.However,the channel fusion method of Mel energy spectrum with weighted averaging reduced the spatial variation characteristics of the audio signal,resulting in a high degree of confusion in the recognition of similar scenes.
Keywords/Search Tags:Intelligent mobile terminal, Audio scene detection, Wavelet denoising, Feature extraction, Random forest, Convolutional neural network
PDF Full Text Request
Related items