Automatic Speech Recognition

Posted on:2021-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:Muhib Ullah

Full Text:PDF

GTID:2428330611966325

Subject:Electrical and Computer Engineering

Abstract/Summary:

PDF Full Text Request

The purpose of Automatic Speech Recognition(ASR)system is to transcribe a continuous acoustic signal onto text and extract linguistic information from acoustic stream.Current ASR systems are proficient in transcribing continuous speech with Word Error Rate(WER)of 10% to 5.7%.During last two decades,most of these systems used Hidden Markov Models(HMMs)in conjunction with Gaussian Mixture Models(GMMs)to model the phonetic units corresponding to the words.HMMs model the temporal variability of speech and GMMs calculate parameters of the acoustic input in each of HMM states.HMMs provide effective way for modelling time-varying spectral vector sequences.Recently Deep Neural Networks(DNNs)are used to determine the fit for each HMM state given acoustic input.The processing of acoustical coefficients of the uttered speech can be classified into two types of pre-processing domain:(a)spectral based parameters(b)dynamic time series.Spectral based parameters is mostly used pre-processing domain.Mel Frequency Cepstral Coefficients(MFCCs)is commonly used approach spectral based approach in recognition task.We choose spectrogram(MFCCs)as preprocessing scheme for uttered speech,although transcribing raw utterance with Recurrent Neural Network(RNN)or restricted Boltzmann machines(RBM)is possible but due to high computational cost the performance might become worse.Recurrent connections have been applied to certain hidden layer of feed-forward network allowing the model to capture the temporal dependencies.Mini-batch gradient descent(MGD)method has chosen and back-propagation through time(BPTT)algorithm is applied to make RNN more efficient and effective in training acoustic model.Finally,we address vanishing gradient and exploding gradient of the basic RNN problems by presenting advanced version of RNN;Gated Units of RNN architecture � Long-Short Term Memory(LSTM).The major contribution of this work includes;the presentation of detailed research of speech recognition system for Large Vocabulary Continuous speech where Acoustic Model(AM)for context-dependent-phone was combined with Pronunciation dictionary and Language Model(LM).In addition,to make recognition process faster we explore multi-pass techniques for movable token.The beam search approach is used to increase computational efficiency by narrowing down the search space.The second contribution of this work is;we show that instead of increasing the number of units in each layer of RNN network,introducing advanced version of RNN � Long-Short-Term-Memory units,could be used which gives better performance.Moreover,LSTM-based recurrent network make efficient use of model parameters to remember long term sequences.LSTMs shows dominance over standard RNN in term of long term contextual dependencies which were considered impossible.

Keywords/Search Tags:

ASR, Hidden Markov Model, MFCCs, Acoustic Model, RNN, LSTM, BPTT

PDF Full Text Request

Related items

1	Research Of The GMM-HMM Based Acoustic Models
2	Research On Multiresolution Hidden Markov Model For Image Denoising
3	The Contourlet-based Statistical Models For SAR Images Denoising
4	Asr Research Based On CTC
5	Researching Of The Mongolian Acoustic Model Based On Speech Recognition
6	Research On Hidden Markov Model And Its Application To Image Recognition
7	Pulse Classification Based On Hidden Markov Model
8	The Study Of Tank Bottom Corrosion Acoustic Emission Signal Identification
9	Hidden Markov Model And Its Application In Mechanical Faults Pattern Recognition
10	Text Classification Based On Hidden Markov Model And Semantic Fusion