Font Size: a A A

The Implementation And Optimization Of Speech Recognition System Based On Kaldi

Posted on:2022-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2518306731477654Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the widespread application of speech recognition(ASR)technology,there are many speech recognition providers on the market.Due to data sensitivity,cost,and customizability,the need of self-built ASR system is still on the market.However,most of the current ASR frameworks are developed for scientific research purpose,not for production purpose.Besides,current ASR frameworks does not support customized requirements like streaming recognition and word biasing.To solve above problems,this paper studies the development and optimization of telephone ASR system.This paper first introduces the meaning of the topic and the history of ASR.Then,we introduce the basis of ASR included n-gram language model,feature extraction,acoustic model based on hidden Markov chain,WFST decoding algorithm and sequence discriminative training.Then we analyze Kaldi ASR framework from the source code level.Then,we introduce the details of system development.The neural network architecture,language model batch processing,architecture and development of ASR service and telephone call service are introduced in this part.Finally,we introduce two use scenarios of this system.This paper also introduces some engineering and algorithm optimizations to the system.In order to overcome language model mismatch,we propose a decoder hot word biasing algorithm.The algorithm can improve recognition accuracy of hot words defined by user when the language model of the system is mismatch from current use scenario.We implement a streaming recognition decoder,and optimize it using backward pointers to avoid lattice repeated generation.To optimize phone service,we use ping-pong buffer to reduce audio card delay.We test our system on open-source data set Aishell2 and tow business data sets.We analysis and compare the impact of sampling rate on accuracy.Experimental results show that our system achieve relatively good accuracy on the business data set.Our system can meet the needs of self-built phone ASR system.
Keywords/Search Tags:Speech Recognition, Decoder, Audio Card, Word Biasing
PDF Full Text Request
Related items