Font Size: a A A

Design And Implementation Of Voice Interaction System In Smart Home

Posted on:2020-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2428330590958226Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In order to meet people's demand of convenient control for smart home devices,intelligent voice interaction devices have gradually entered people's family life.The voice recognition functions of existing systems are implemented on the cloud.There are some problems with this implementation,such as not being able to be used offline,occupying network bandwidth and leaking privacy.This thesis studies the theory of speech recognition,designs and implements an interaction system of voice that can be used offline in smart home.This system consists of two parts,i.e.,the keywords spotting system and the large vocabulary continuous speech recognition system.Keyword spotting system is used for device wake-up and short instruction identification,which is deployed on a microcontroller.I studies and compares the application of deep fully connected neural network,convolutional neural network and depthwise separable convolutional neural network in keyword spotting task from three aspects: the number of parameters,recognition rate and calculation amount in the stage of inference.In order to realize the deployment of keyword spotting model on microcontroller,I use dynamic fixed-point quantization to further reduce the model storage requirements and use SIMD instructions to accelerate the model's inference operations.The large vocabulary continuous speech recognition system can recognize continuous long sentences and is deployed on the control center node,which is equipped with the Intel Movidius neural network computing stick.Our end-to-end speech recognition model can directly convert audio to text without intermediate phoneme representation.The model combines convolutional neural network,bidirectional long short-term memory neural network and connectionist temporal classification to establish a mapping relationship between the input sequence of Mel Frequency Cepstral Coefficient and the output Pinyin sequence.Then I use the N-gram language model and Viterbi algorithm to convert Pinyin to Chinese characters.Our keyword spotting model can achieve 93.5% recognition rate;SIMD instruction can reduce the computation time of the model on the microcontroller by about 70%.The large vocabulary continuous speech recognition system can achieve 81.7% recognition rate without language model auxiliary decoding.After adding the N-gram language model,the recognition rate increases to 84.4%.
Keywords/Search Tags:human-comuter interaction, keywords spotting, large vocabulary continuous speech recognition, neural network
PDF Full Text Request
Related items