Font Size: a A A

Research On Speech Recognition Based On Kaldi

Posted on:2022-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2518306557470344Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence technology,the way of humancomputer interaction is constantly updated.Voice interaction,a more intelligent and efficient way of interaction,has gradually been the focus for that reason.The first step of voice interaction is speech recognition,which can transform human language into machine recognizable instructions.And the rise of neural network technology and the continuous improvement of computer performance has created a sound developmental environment for speech recognition technology.Owing to these advantages,the accuracy of speech recognition has risen to a very high level,which makes speech recognition technology more widely used in real life.The main content of this thesis is the construction and optimization of speech recognition system based on Kaldi.Firstly,the basic principle of speech recognition technology is introduced,including the generation,the analysis method and the feature extraction of speech signal.Secondly,based on the relevant technology of Kaldi,this thesis focuses on the Weighted Finite State Transducer under Kaldi,analyzes the implementation of WFST and the method of constructing decoding graph based on WFST,and gives the method of constructing WFST online.Use Kaldi to train GMM-HMM model,and analyze the principle of acoustic modeling using GMM and HMM as well as the training method of the model.In addition,I study the implementation of each module of the acoustic model in Kaldi and the mechanism between the modules,and train the basic monophonic model system as well.In order to solve the problem of co-pronunciation in real scenes,the triphone model is studied on the basis of the monosyllabic model.In the training of triphone model,the features of unsupervised transformation and supervised transformation are used respectively to improve the performance of GMM-HMM model,which will provide good training data for deep neural network model training.This thesis also concentrates on how to implement DNN-HMM model on the basis of GMMHMM model.First of all,the author studies the principles and methods of acoustic model by using DNN,which is based on the deep understanding of the principles and application scenarios of neural networks with different structures.Then,analyze the realization and application of neural network model under Kaldi,and try to use Kaldi to build and train DNN-HMM model.The data shows that the word error rate of the DNN-HMM model is about 6% lower than that of the GMM-HMM model with the best performance.In decoding part,the hybrid re-scoring method of RNN and N-gram language model is used to further reduce the word error rate of speech recognition system,and finally obtain a stable and available speech recognition system.
Keywords/Search Tags:Speech recognition, Kaldi, Acoustic model, GMM-HMM, DNN-HMM
PDF Full Text Request
Related items