Font Size: a A A

Research On Speech Command Word Recognition Based On Dual Microphone Array And Deep Learning

Posted on:2022-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:X X QiFull Text:PDF
GTID:2518306554968309Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Voice interactive devices are changing people's lifestyles.They are widely used in smart homes and in-vehicle voice devices.It is a challenging problem to wake up the devices by voice.However,in a complex environment,how to accurately receive commands by voice devices is a challenge.This paper studies the voice command word recognition method based on neural network and double microarray.The main work of the paper is as follows:Generally speaking,there are two ways of speech recognition.One is to collect the user's speech content through the voice device,transmit the speech to the cloud through the device,and return the corresponding command after the cloud is parsed.The disadvantage of this approach is that it requires networking to be implemented.The other is offline speech recognition,which requires no Internet connection and is fast in response.Offline speech recognition will use command word recognition technology.The most direct index of command word recognition system performance is wake-up rate,especially in complex environment can maintain a high wake-up rate.Other metrics include false alarm rate,overall power consumption,and user experience.Therefore,how to design a system to improve the robustness of the system under complex environment is the focus of research.To this end,this paper studies the neural network in the complex environment to improve the speech recognition rate,improve the response time method,the main research overview is as follows:1.A speech endpoint detection algorithm combined with recurrent neural network is studied.The algorithm converts noisy speech into a spectrogram through Fourier transform,finds the frequency active points from the spectrogram,and then performs convolution-gated recurrent neural network Training,and finally predict the training results.Using two-way gated recurrent unit(Bi GRU)network,two-way long short-term memory(Bi LSTM)network and CNN-Bi GRU three networks to perform experiments in an environment with a signal-to-noise ratio of-10 d B,0d B and 10 d B.From the experimental results,in the environment with SNR of-10 d B,0d B,and 10 d B,the prediction accuracy based on the CNNBi GRU network model is higher than that of the other two network models.2.A dual-microphone speech enhancement algorithm combining differential microphone array and adaptive noise reduction is studied.The algorithm uses first-order microphone array technology,uses an adaptive algorithm for noise reduction,and finally uses a log MMSE algorithm as a post-filter.Experiments show that the algorithm can suppress the problem of directional noise interference and improve the voice quality.3.A command word recognition algorithm based on dual microphone array and wide residual network is studied.Based on the original residual module,this algorithm enlarges the network width and reduces the network depth,but the overall network parameters remain unchanged.The algorithm is combined with the dual microphone array system,and the voice data set is the dual microphone data set.The power normalized cepstrum coefficient is used as the characteristic parameter to input into the residual network for training.Experimental results show that,compared with the Resnet15 model and Resnet18 model,the wide residual network with only three residual modules has higher accuracy in the recognition of speech command words and the internal and external speaker detection task under noise environment,both reaching more than 95%.
Keywords/Search Tags:voice activity detection, command word recognition, differential microphone array, double microarray, deep residual network, power normalized cepstrum coefficient
PDF Full Text Request
Related items