Design And Implementation Of Voice Interaction System In Smart Home

Posted on:2020-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:S Wang

Full Text:PDF

GTID:2428330590958226

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

In order to meet people's demand of convenient control for smart home devices,intelligent voice interaction devices have gradually entered people's family life.The voice recognition functions of existing systems are implemented on the cloud.There are some problems with this implementation,such as not being able to be used offline,occupying network bandwidth and leaking privacy.This thesis studies the theory of speech recognition,designs and implements an interaction system of voice that can be used offline in smart home.This system consists of two parts,i.e.,the keywords spotting system and the large vocabulary continuous speech recognition system.Keyword spotting system is used for device wake-up and short instruction identification,which is deployed on a microcontroller.I studies and compares the application of deep fully connected neural network,convolutional neural network and depthwise separable convolutional neural network in keyword spotting task from three aspects: the number of parameters,recognition rate and calculation amount in the stage of inference.In order to realize the deployment of keyword spotting model on microcontroller,I use dynamic fixed-point quantization to further reduce the model storage requirements and use SIMD instructions to accelerate the model's inference operations.The large vocabulary continuous speech recognition system can recognize continuous long sentences and is deployed on the control center node,which is equipped with the Intel Movidius neural network computing stick.Our end-to-end speech recognition model can directly convert audio to text without intermediate phoneme representation.The model combines convolutional neural network,bidirectional long short-term memory neural network and connectionist temporal classification to establish a mapping relationship between the input sequence of Mel Frequency Cepstral Coefficient and the output Pinyin sequence.Then I use the N-gram language model and Viterbi algorithm to convert Pinyin to Chinese characters.Our keyword spotting model can achieve 93.5% recognition rate;SIMD instruction can reduce the computation time of the model on the microcontroller by about 70%.The large vocabulary continuous speech recognition system can achieve 81.7% recognition rate without language model auxiliary decoding.After adding the N-gram language model,the recognition rate increases to 84.4%.

Keywords/Search Tags:

human-comuter interaction, keywords spotting, large vocabulary continuous speech recognition, neural network

PDF Full Text Request

Related items

1	Application Of Convolutional Neural Network In Large Vocabulary Continuous Speech Recognition
2	The Performance Optimization Research On Large Vocabulary Continuous Speech Recognition
3	The Mandarin Continuous Speech Keyword Spotting System Medium Vocabulary
4	Research On Human Computer Interaction Based On Speech Keyword Spotting
5	Research On Large Vocabulary Continuous Speech Recognition Based On Deep Learning
6	A Study Of An Irrelevant Variability Normalization Based Large Vocabulary Continuous Speech Recognition
7	Establishment Of Mandarin Large Vocabulary Continuous Speech Recognition Based On Hybrid ANN/HMM Models
8	Real-time speaker -independent large vocabulary continuous speech recognition
9	Research And Implementation On Chinese Speech Keywords Spotting Based On HMM
10	Modeling lexical tones for Mandarin large vocabulary continuous speech recognition