Font Size: a A A

Research On Speech Signal Recognition Based On Deep Two-way GRU And Attention Mechanism

Posted on:2021-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q DingFull Text:PDF
GTID:2518306470479994Subject:Master of Engineering Transportation Engineering
Abstract/Summary:PDF Full Text Request
Speech is an important bridge for communication between people and an important medium for humans to interact with information.Especially with the advent of the intelligent age and the advances in machine learning and artificial intelligence technologies,the demand for more intelligent speech recognition systems has become higher and higher,as a result,a series of researches on speech recognition technology have appeared.Traditional speech recognition systems GMM-HMM and DNN-HMM have a poor performance on making use of the context to predict the current condition.Therefore,people have proposed to use recurrent neural networks to build acoustic models,but in the process of using the recurrent neural networks,cyclic neural network has the disadvantages of learning dependence between distant information and current information.Therefore,we proposed an acoustic model based on deep bi-directional gate recurrent unit(DBGRU).And as constructing a language model,a new language model based on the attention mechanism is proposed to address the weakness of the ability of the RNN language model to handle long text information and N-game language that requires a lot of text training.By designing new acoustic models and language models,we build a speech recognition system based on deep two-way GRU and attention mechanism.This article mainly consists:(1)An acoustic model based on DBGRU-CTC is designed.The model mainly uses a deep two-way GRU and a CTC loss function to build an acoustic model.Depth is used to enhance the model's ability to extract audio features,and bidirectional GRU enhances the network's ability to train and process audio.The experimental results show that the word error rate based on the DBGRU acoustic model is only 12.79%,compared with the traditional speech recognition systems GMM-HMM and DNN-HMM,the word error rates are reduced by 19.36% and 13.41%,respectively.(2)A language model based on attention mechanism is designed.The model mainly uses a multi-head attention mechanism module and a fully connected network module.Among them,the use of multiple heads of attention makes the model have the ability to process long texts,and can assign different attention to the words in the sentence,making the language model have better language expression ability.Adopting a fully connected neural network,the model has better abstract expression ability through the network's dimensional increase and decrease operations.The experimental results show that the accuracy rate of the language model built using the attention mechanism reaches 91.15%,and the model's confusion is only 38.64% on the test set.(3)A speech recognition system based on DBGRU-CTC and attention mechanism is designed.Through experiments,the DBGRU-CTC acoustic model and the attention mechanism language model are integrated into the speech recognition system,which indicates that the designed language model can increase the recognition rate of the speech recognition system by 0.08% compared with the pure acoustic model.After analyzing and discussing the acoustic model comparison experiments of the speech recognition system,the experimental results show that the recognition accuracy of the DBGRU-CTC model is improved by 22.75%,17.54%,and 6.42%,respectively,compared with RNN-CTC,GRU-CTC,and MBGRU-CTC.At the same time,the performance of the model under different iterations is also tested.And the model is tested in a noisy environment.The results show that the recognition accuracy of the DBGRUCTC model still reaches 81.21% in the noisy environment.
Keywords/Search Tags:Speech recognition, DBGRU, CTC, Attention mechanism
PDF Full Text Request
Related items