Font Size: a A A

Uyghur Speech Recognition Based On Deep Recurrent Neural Network

Posted on:2022-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:L Z LiFull Text:PDF
GTID:2518306539498284Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the field of artificial intelligence,speech recognition technology has a very wide range of applications and market prospects.In an automated control system based on voice input,it has been able to help people quickly use an automated keyboard,and voice recognition technology has achieved human-computer interaction,making people's lives more convenient.Traditional speech recognition is a hybrid acoustic model of deep neural network combined with hidden Markov model.The deep neural network model has a relatively large amount of training parameters.As the number of layers increases,it may cause the problem of expanding the number of parameters,so the choice is better The acoustic model is very necessary,and our main work is to improve the acoustic model.At the same time,the number of leaf nodes of the DNN network model and the total number of Gaussian mixture parameters were tuned.In order to overcome the problem of the disappearance of the gradient during DNN training,we made changes to the activation function and achieved good results;at the same time,a deeper network layer was selected The recognition rate has also been improved.Finally,the results of deep neural network tuning are used to align the pre-training data labels to prepare for the subsequent training of the recurrent neural network model.Integrating a hidden layer with a large number of neurons in a DNN has been shown to greatly improve the modeling ability of the DNN,but in the training process,the use of a deep neural network will require a higher amount of calculation.At present,there is less work in the optimization of acoustic models,and it needs to be studied in terms of uniqueness and details,in order to improve the accuracy of speech recognition,the cyclic neural network is used as the acoustic model.The effect of the two-way cyclic neural network is better than that of the ordinary cyclic neural network.In order to effectively reduce the complexity of the training model,this paper proposes a two-way improved gate control Voice recognition method based on acoustic model of cyclic unit.The reset gate is removed from the model,and the ReLU activation function is used in the state update process and is effectively combined with the BN algorithm used in the feedforward connection.The improved model can reduce the complexity of the model and accelerate the model convergence;the use of a two-way structure can not only help effectively The model captures the semantic timing information of the past and the future,and can improve the recognition effect.The experimental results on thuyg-20 Uyghur corpus show that compared with the baseline traditional depth neural network,the absolute word error rate is reduced by 2.34%using bidirectional improved gated recurrent unit;the model can also reduces the per-epoch training time by 13.4% over standard bidirectional long short-term memory(LSTM)model.This paper also uses the fusion method of different acoustic features,which can fully combine and utilize the characteristics of different acoustic features.The experimental results show that the fusion method of acoustic features is very helpful to the improvement of the ASR recognition rate.
Keywords/Search Tags:speech recognition, acoustic model, acoustic feature, ReLU, Bidirectional Recurrent Neural Networks
PDF Full Text Request
Related items