Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition

Posted on:2016-07-02

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S F Xue

Full Text:PDF

GTID:1228330470958003

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Speaker adaptation has been an important research topic in automatic speech recog-nition (ASR) for decades. Speaker adaptation techniques attempt to optimize ASR per-formance by transforming speaker-independent (SI) models towards one particular s-peaker or modifying the target speaker features to match the given speaker-independent models based on a relatively small amount of adaptation data. In the traditional Gaus-sian mixture model (GMM)-hidden Markov model (HMM) speech recognition systems, speaker-adapted (SA) systems can cut errors by5%-30%over the SI systems. As the hybrid deep neural networks (DNN) and HMM models revive in acoustic modelling for large vocabulary continuous speech recognition systems, it now becomes a very inter-esting problem to perform effective speaker adaptation for DNNs since most traditional techniques could not work. In this paper, we investigate fast DNN training on multi-GPUs and then propose several fast speaker adaptation methods for DNNs based on it. The content are as follows:Firstly, we investigate fast DNN training algorithm and the implementation on multi-GPUs. We study DNN training algorithm and transfer it to matrix manipulation which is suitable for GPUs, then we programme it based on CUDA C and propose two different pre-training methods named Rotate Per Epoch and Complete Within Split to conduct DNN training on multi-GPUs. Experimental results of the phone recognition on TIMIT corpus have shown that the proposed fast implementation can significant improve the training speed with comparable recognition accuracy.Secondly, we extend the idea of speaker-code based adaptation in feature space (fSA-SC) and propose an alternative direct adaptation method that performs speaker adaptation in model space (mSA-SC) without using adaptation NNs. Comparing with fSA-SC, this direct adaptation method using speaker codes is quite effective for large and deep neural networks. Moreover, we use i-vectors to replace the speaker codes in our previous work and achieve slightly better performance. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks.Thirdly, we investigate the lattice-based sequence level maximum mutual informa-tion (MMI) criterion to conduct speaker adaptation for DNNs. More importantly, we propose a new speaker adaptive training strategy (SAT) for DNNs based on the speak-er codes. Instead of adapting a pre-trained speaker-independent DNN model based on adaptation data, the proposed SAT method attempts to estimate more effective speaker-independent DNN models from a single training procedure by applying speaker normal-ization based on speaker codes. The Switchboard results have shown that the proposed speaker-code based adaptation methods may achieve up to25%relative error reduction using only a few dozens of adaptation utterances per speaker.Finally, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal ma-trices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying part of the singular values. After that, we use SVD to initialize speaker codes and connection weights to obtain a comparable ASR performance as in previous work but with a smaller speaker code size and much less computation complexity, which is very important for the efficiency.

Keywords/Search Tags:

speech recognition, deep neural network, speaker adaptation, DNN fasttraining, speaker code, i-Vector, sequence discriminative training, singular value de-composition

PDF Full Text Request

Related items

1	Research On Speaker Adaptation Methods Based On RNN-BLSTM Acoustic Model
2	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
3	Research On Speaker Adaptation In Speech Recognition
4	Research On Speaker Adaptation Of Deep Neural Network
5	Research On Acoustic Modeling For Spontaneous Spoken Speech Recognition
6	Research On Speaker Adaptation Of Neural Network Acoustic Models
7	Discriminative training for speaker adaptation and minimum Bayes risk estimation in large vocabulary speech recognition
8	Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning
9	Research And Application Of Chinese Text-to-speech Based On Recurrent Neural Network
10	Research On The Discrimination Issue In Speaker Recognition