Font Size: a A A

Keyword Detection In Noisy Environments

Posted on:2020-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuFull Text:PDF
GTID:2428330596492260Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the proliferation of smart homes and mobile and automotive devices,speech-based human-machine interaction becomes prevailing,e.g.,in Google Now,Microsoft Cortana,Amazon Alexa,and Apple Siri.To achieve hands-free speech recognition,the system continuously listens for specific wake-up keywords,a process often called keyword detection(KWD)or keyword spotting(KWS),to start speech recognition.From the perspective of practice,the keyword detection systems typically run on the small-footprint device with low power consumption.Robustness against noise is critical for keyword detection systems in the real-world environments.To improve the robustness,a speech enhancement frontend is involved.This thesis attempts to improve the robustness of the keyword detection model in three aspects.Firstly,instead of treating the speech enhancement as separated preprocessing before the keyword detection system,in this study,the pre-trained speech enhancement frontend and the convolutional neural networks(CNNs)based keyword detection system are concatenated,where a feature transformation block is used to transform the output from enhancement frontend into the keyword detection system's input.The whole model is trained jointly,thus the linguistic and other useful information from the keyword detection system can be back-propagated to the enhancement frontend to improve its performance.Secondly,to fit the small-footprint requirement of on-device deploying,a novel convolution recurrent network is proposed,which needs fewer parameters and computation and does not degrade performance.Finally,by changing the input features from the power spectrum to Mel spectrum,less computation and better performance are obtained.our experimental results demonstrate that the proposed method significantly improve the KWS system with respect to noise robustness.The proposed model,joint trained CNN-CRN32 achieves an accuracy of 93.17% under noisy conditions,which is 64.2% higher than the baseline trained Multi-conditional data.The proposed method significantly improve the robustness of the keyword detection systems.
Keywords/Search Tags:Robust Keyword Detection, Speech Enhancement, Convolutional Recurrent Neural Network, Joint-Training
PDF Full Text Request
Related items