Font Size: a A A

Research And Implementation Of Algorithms For Increasing Speech Recognition Ratio

Posted on:2022-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:M DongFull Text:PDF
GTID:2518306605470224Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech as the main way for people to communicate,is the most direct and convenient way of interaction.Speech recognition technology has experienced considerable development,and some good open source speech recognition systems have emerged,including HTK,Kaldi and Sphinx.Through the analysis of different open source speech recognition systems,this thesis chooses to use Sphinx as the basis to conduct research on the algorithm and implementation of improving the speech recognition radio.A general speech recognition system includes four modules: acoustic model,language model,dictionary construction and decoding search.Among them,the acoustic model affects the mapping of the input speech to the basic acoustic unit,which is the key to the speech recognition algorithm.Therefore,this thesis improves the speech recognition radio by studying the acoustic model training algorithm.This thesis uses the classic GMM-HMM algorithm as the basic algorithm for acoustic model training.In order to design the optimal command word speech recognition system,the three aspects of the algorithm are the basic acoustic unit selection,the selection of the number of HMM states and the number of mixture Gaussian models,and the discrimination training of MMIE.Firstly,the influence of the selection of basic acoustic units on the performance of the acoustic model is studied.Based on the Chinese command words,three different basic acoustic units are selected: phonemes,syllables and words.Experiments show that selecting phonemes as basic acoustic units can get higher speech Recognition radio;Secondly,the effect of the number of mixed Gaussian models and the number of hidden Markov states on the performance of the acoustic model is studied.Experiments have shown that blindly increasing the number of Gaussian models cannot continue to improve the performance of the acoustic model,where use five-state HMM can achieve a better speech recognition radio than the three states;Finally,the MMIE discriminative training module is added,and the obtained acoustic model is further optimized and trained.Experiments show that for different baseline speech recognition systems,adding MMIE discriminative training can improve the accuracy of speech recognition.In the acoustic model training algorithm,a mixture of Gaussian model covariance matrix is used to store voice information,which will cause part of the voice information to be lost,thereby affecting the performance of the speech recognition system.In response to this problem,this thesis improves and implements an acoustic model training algorithm based on the hybrid covariance matrix.Using the acoustic model training process,the mixed Gaussian model covariance matrix has different effects on different training stages,and the characteristics of different forms of covariance matrix are different,and a staged hybrid covariance matrix acoustic model is proposed.The full covariance matrix is used when training the context-independent CI model in the first stage,and the matrix conversion is performed in the second and fourth stages when the context-sensitive CD model is trained,and specific matrix conversion strategies are given.Finally,experiments were carried out on the Chinese data set and English data set.The experiment proved that the improved hybrid covariance matrix acoustic model solves the problem of using a single covariance matrix in the training process without significantly increasing the complexity of training time.The problem of omission and over-reading of voice information effectively improves the voice recognition radio.Basing on the above research,this thesis designs and implements a real-time Chinese command word speech recognition system.Aiming at the problem of misrecognizing background noise as text in practical applications,the speech activity detection algorithm based on the Gaussian mixture model is used to perform front-end processing on the speech signal.The algorithm models the speech segment and noise segment signals respectively,and then The speech existence is judged by the maximum likelihood estimation criterion to obtain the speech segment signal,and finally the speech is extracted and sent to the Chinese command word speech recognition system for speech recognition.After testing,the recognition radio of the real-time speech recognition system after adding VAD has been improved.
Keywords/Search Tags:Speech Recognition, Mixed Gaussian Model, Hidden Markov Model, Acoustic model, Mixed Covariance Matrix
PDF Full Text Request
Related items