Study On Speech Recognition Based On Deep Learning

Posted on:2019-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:F F Zhang

Full Text:PDF

GTID:2428330566984957

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Automatic speech recognition(ASR)mainly includes continuous speech recognition and keyword spotting(KWS).It is an important research field of communication between human and machines.The traditional speech recognition based on Hidden Markov Model got a poor result,it is difficult to realize the intelligent interaction between human and machines.Besides,we studied KWS deeply.KWS is an significant component on smart devices.Recently,neural networks have become an attractive choice for KWS architecture because their superior accuracy.Since KWS applicaton runs on tiny microcontrollers with limited memory and compute capability,the design of neural network architecture for KWS must consider these constraints.The work of this thesis can be mainly divided into three parts:(1)Continuous speech recognition based on Hidden Markov Model and time delay neural network was studied.A trained model was used to build the online decoding system.The decoder of CVTE model on Kaldi was impoved on recognition accuracy,and the decoder of CVTE was improved on efficiency.(2)We explored the application of channel shuffle convolutional neural network to KWS.Through the group convolution and channel shuffle operations,the amount of parameters and the calculations has been reduced.By fine tuning the model structure,the channel shuffle convolutional neural network get a good accuracy.With the constraints of small model,the larger the number of groups is,the higher the accuracy is.Experiments verifies the effectiveness of the model under different model sizes.(3)We explored the application of inverted residual convolutional neural network to KWS.The model adopts the depthwise convolution and inverted residual network structure.With the same amount of parameters and calculations,the suggested model outperforms previous KWS variants.For the further improvement of the inverted residual convolutional neural network,we proposed an computation-efficient model named CSIR-CNN.By replace the convolution layer in inveterd residual architechture with group convolution and channel shuffle.We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on KWS.

Keywords/Search Tags:

Continuous Speech Recognition, Keyword Spotting, Online Decoder, Deep Learning, Channel Shuffle, Inverted Residual

PDF Full Text Request

Related items

1	The Mandarin Continuous Speech Keyword Spotting System Medium Vocabulary
2	Research On Human Computer Interaction Based On Speech Keyword Spotting
3	Research On Speech Keyword Spotting Technology Based On Deep Learning
4	Keyword spotting in continuous speech utterances
5	Research On Keyword Spotting Technology Of Chinese Speech Recognition System
6	Rapid Keyword Spotting In Continuous Speech
7	Application And Research On Speech Recognition Technologies In Security Monitoring System
8	Research On Speech Keyword Spotting Technology Based On Deep Learning
9	Research On Speech Keyword Spotting Technology For Mongolian
10	Research On Adaption Technique In Continuous Speech Keyword Spotting System