Research On Speech Keyword Spotting Technology Supporting Custom

Posted on:2024-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:M Du

Full Text:PDF

GTID:2568307079454404

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Key Word Spotting(KWS)is a task that detects the presence of specific vocabulary in audio.It has gained increasing attention and development as a simple and direct means of human-computer interaction.However,with the popularity of smart devices,the detection of predefined keywords is no longer sufficient to meet the diverse and personalized usage needs of consumers.Custom keywords detection has therefore become a priority.Due to the scarcity of training samples for custom keywords,most existing algorithms for custom keywords detection rely on large neural networks based on phoneme classification,either locally or in the cloud.However,this is inconsistent with people’s demand for privacy and the trend of miniaturization and low power consumption of devices.Furthermore,supporting only Mandarin detection cannot meet the practical situation of numerous dialects in China.To address these issues,this paper designs a neural network-based dynamic template matching algorithm,which enables real-time detection of custom keywords in an offline manner based on extensive research on previous methods.The main contributions of this paper are as follows:First,to enhance the feature extraction capability of the neural network for speech signals,a combination of convolutional and recurrent layers is designed as a deep feature extractor,taking advantage of the temporal and spectral characteristics of the Fbank features used as network input.By properly designing the size of the convolutional kernels,phoneme-level length features can be better extracted.The recurrent layers then facilitate the correlation of features between phonemes,thereby enhancing the effectiveness of feature representation.Second,in order to reduce the influence of speech difference of speakers at different moments and improve the generalization of registration template,the attention mechanism is introduced after the depth feature extractor.It calculates the similarity between registration and test templates in real-time to obtain attention scores,which are used to dynamically update the registration templates,thereby improving the effectiveness of keyword detection.Compared to direct detection using features from deep feature extractor,the accuracy is improved by 5.6%.Finally,the accuracy for 11-class custom keywords detection on the GSCD dataset reaches 91.56%,and on a self-made Chinese speech keywords dataset,it achieves an accuracy of 91.33%,and the FRR at 1FA/hour was 7.80%.Third,in order to enable the system to be deployed on resource-constrained and power-limited terminal devices,a voice activity detection module is designed to reduce the dynamic power consumption of the system during idle periods to 5% of the working state.Various optimization techniques are employed on the hardware side to reduce resource usage and system latency,achieving reduced resource occupancy and power consumption.Ultimately,the system is deployed on the FPGA development board based on Xilinx A7 chip with a total power consumption of 0.229 W at 10 MHz clock,and the response latency is approximately 31 ms.

Keywords/Search Tags:

Speech Keyword Spotting, Custom Keywords, Neural Network

PDF Full Text Request

Related items

1	Research On Speech Keyword Spotting Algorithm Based On Neural Network
2	Research On Keyword Spotting Technology Based On Neural Network
3	Research On Aduio Keyword Spotting Technology Based On Neural Network
4	Research And Implementation On Chinese Speech Keywords Spotting Based On HMM
5	Study On Speech Keyword Spotting Methods Based On Deep Learning
6	Research On Human Computer Interaction Based On Speech Keyword Spotting
7	Research On Speech Keyword Spotting Technology For Mongolian
8	Research On Keyword Spotting Technology Of Chinese Speech Recognition System
9	Keyword spotting in continuous speech utterances
10	The Mandarin Continuous Speech Keyword Spotting System Medium Vocabulary