Font Size: a A A

Design And Implementation Of Speech Recognition System Based On DNN-LSTM

Posted on:2021-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiFull Text:PDF
GTID:2518306557492574Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automatic speech recognition refers to automatically converting the sound signals emitted by people into their corresponding text messages.As the most natural means of human-computer interaction,this technology has always been a hot research field of artificial intelligence.As a multi-layer nonlinear model,the deep neural network model has sufficient learning ability and can solve the speech recognition problem in complex environments well.As an advanced version of recurrent neural network,LSTM network solves the vanishing gradient problem and exploding gradient problem.However,this type of neural network language model generally has the disadvantage of too large model and slow running speed.Therefore,in the operation of the speech recognition system,the first N results are often obtained in the first recognition,and then these optimal results are put into the neural network language model and run again to obtain the final result.This two-stage recognition technical solution can be applied to neural network language models,and can also reduce the amount of model calculations and speed up the system's running speed.This thesis is aimed at the public security interrogation transcript system.It designs and implements a large vocabulary continuous speech recognition system based on DNN-LSTM.The main work is as follows:1.According to the application scenarios used in the thesis,analyze its functional requirements and non-functional requirements,and divide the functional modules of the system according to the requirements analysis results,and then outline the design and analysis of each sub-module according to the system functional module diagram.2.In the decoding module,an acoustic model based on DNN-HMM is constructed.In this model,DNN is used to calculate the posterior probability of phonemes corresponding to each frame of acoustic features,and HMM is used to model the dynamic attributes of acoustic features.And the DNN is modified by using cross-layer transfer links.Cross-layer transfer can directly transfer the features learned from the shallow layer to the deep layer,thereby reducing the loss of feature transfer caused by too many network layers.Experiments verify the effectiveness of the acoustic model.3.A re-evaluation module based on LSTM is added to improve the overall recognition performance of speech recognition.Based on the original tri-gram model as the language model,a re-evaluation module based on the LSTM language model was added in addition to the decoding module.Experiments show that after adding the re-evaluation module,the overall recognition efficiency of the system is better than the one-time recognition scheme.4.Based on the above work,a real-time speech recognition system for interrogation system based on DNN acoustic model and LSTM language model is designed and implemented.The test results show that the designed and implemented system meets the needs.
Keywords/Search Tags:Automatic Speech recognition, Language model, Long and short term memory network
PDF Full Text Request
Related items