Keyword Detection In Noisy Environments

Posted on:2020-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Gu

Full Text:PDF

GTID:2428330596492260

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the proliferation of smart homes and mobile and automotive devices,speech-based human-machine interaction becomes prevailing,e.g.,in Google Now,Microsoft Cortana,Amazon Alexa,and Apple Siri.To achieve hands-free speech recognition,the system continuously listens for specific wake-up keywords,a process often called keyword detection(KWD)or keyword spotting(KWS),to start speech recognition.From the perspective of practice,the keyword detection systems typically run on the small-footprint device with low power consumption.Robustness against noise is critical for keyword detection systems in the real-world environments.To improve the robustness,a speech enhancement frontend is involved.This thesis attempts to improve the robustness of the keyword detection model in three aspects.Firstly,instead of treating the speech enhancement as separated preprocessing before the keyword detection system,in this study,the pre-trained speech enhancement frontend and the convolutional neural networks(CNNs)based keyword detection system are concatenated,where a feature transformation block is used to transform the output from enhancement frontend into the keyword detection system's input.The whole model is trained jointly,thus the linguistic and other useful information from the keyword detection system can be back-propagated to the enhancement frontend to improve its performance.Secondly,to fit the small-footprint requirement of on-device deploying,a novel convolution recurrent network is proposed,which needs fewer parameters and computation and does not degrade performance.Finally,by changing the input features from the power spectrum to Mel spectrum,less computation and better performance are obtained.our experimental results demonstrate that the proposed method significantly improve the KWS system with respect to noise robustness.The proposed model,joint trained CNN-CRN32 achieves an accuracy of 93.17% under noisy conditions,which is 64.2% higher than the baseline trained Multi-conditional data.The proposed method significantly improve the robustness of the keyword detection systems.

Keywords/Search Tags:

Robust Keyword Detection, Speech Enhancement, Convolutional Recurrent Neural Network, Joint-Training

PDF Full Text Request

Related items

1	Research On Speech Keyword Spotting Technology In Noisy Environments
2	Research On Speech Enhancement Method Based On Parallel Optimize Recurrent Neural Network
3	Research On Supervised Speech Enhancement Based On Deep Neural Networks
4	Research On Speech Enhancement Algorithms Based On Deep Learning
5	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
6	Codebook-based Speech Enhancement Using Deep Neural Network
7	Study On Speech Enhancement Based On Deep Learning
8	Research On End-to-end Speech Enhancement Algorithm Based On Attention Joint Convolutional Network
9	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
10	Noise Robust Speech Recognition Based On CNN-TDNN And Transfer Learning