Font Size: a A A

Study On Speech Recognition Based On Deep Learning

Posted on:2019-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:F F ZhangFull Text:PDF
GTID:2428330566984957Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Automatic speech recognition(ASR)mainly includes continuous speech recognition and keyword spotting(KWS).It is an important research field of communication between human and machines.The traditional speech recognition based on Hidden Markov Model got a poor result,it is difficult to realize the intelligent interaction between human and machines.Besides,we studied KWS deeply.KWS is an significant component on smart devices.Recently,neural networks have become an attractive choice for KWS architecture because their superior accuracy.Since KWS applicaton runs on tiny microcontrollers with limited memory and compute capability,the design of neural network architecture for KWS must consider these constraints.The work of this thesis can be mainly divided into three parts:(1)Continuous speech recognition based on Hidden Markov Model and time delay neural network was studied.A trained model was used to build the online decoding system.The decoder of CVTE model on Kaldi was impoved on recognition accuracy,and the decoder of CVTE was improved on efficiency.(2)We explored the application of channel shuffle convolutional neural network to KWS.Through the group convolution and channel shuffle operations,the amount of parameters and the calculations has been reduced.By fine tuning the model structure,the channel shuffle convolutional neural network get a good accuracy.With the constraints of small model,the larger the number of groups is,the higher the accuracy is.Experiments verifies the effectiveness of the model under different model sizes.(3)We explored the application of inverted residual convolutional neural network to KWS.The model adopts the depthwise convolution and inverted residual network structure.With the same amount of parameters and calculations,the suggested model outperforms previous KWS variants.For the further improvement of the inverted residual convolutional neural network,we proposed an computation-efficient model named CSIR-CNN.By replace the convolution layer in inveterd residual architechture with group convolution and channel shuffle.We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on KWS.
Keywords/Search Tags:Continuous Speech Recognition, Keyword Spotting, Online Decoder, Deep Learning, Channel Shuffle, Inverted Residual
PDF Full Text Request
Related items