Font Size: a A A

Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition

Posted on:2016-07-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:S F XueFull Text:PDF
GTID:1228330470958003Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speaker adaptation has been an important research topic in automatic speech recog-nition (ASR) for decades. Speaker adaptation techniques attempt to optimize ASR per-formance by transforming speaker-independent (SI) models towards one particular s-peaker or modifying the target speaker features to match the given speaker-independent models based on a relatively small amount of adaptation data. In the traditional Gaus-sian mixture model (GMM)-hidden Markov model (HMM) speech recognition systems, speaker-adapted (SA) systems can cut errors by5%-30%over the SI systems. As the hybrid deep neural networks (DNN) and HMM models revive in acoustic modelling for large vocabulary continuous speech recognition systems, it now becomes a very inter-esting problem to perform effective speaker adaptation for DNNs since most traditional techniques could not work. In this paper, we investigate fast DNN training on multi-GPUs and then propose several fast speaker adaptation methods for DNNs based on it. The content are as follows:Firstly, we investigate fast DNN training algorithm and the implementation on multi-GPUs. We study DNN training algorithm and transfer it to matrix manipulation which is suitable for GPUs, then we programme it based on CUDA C and propose two different pre-training methods named Rotate Per Epoch and Complete Within Split to conduct DNN training on multi-GPUs. Experimental results of the phone recognition on TIMIT corpus have shown that the proposed fast implementation can significant improve the training speed with comparable recognition accuracy.Secondly, we extend the idea of speaker-code based adaptation in feature space (fSA-SC) and propose an alternative direct adaptation method that performs speaker adaptation in model space (mSA-SC) without using adaptation NNs. Comparing with fSA-SC, this direct adaptation method using speaker codes is quite effective for large and deep neural networks. Moreover, we use i-vectors to replace the speaker codes in our previous work and achieve slightly better performance. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks.Thirdly, we investigate the lattice-based sequence level maximum mutual informa-tion (MMI) criterion to conduct speaker adaptation for DNNs. More importantly, we propose a new speaker adaptive training strategy (SAT) for DNNs based on the speak-er codes. Instead of adapting a pre-trained speaker-independent DNN model based on adaptation data, the proposed SAT method attempts to estimate more effective speaker-independent DNN models from a single training procedure by applying speaker normal-ization based on speaker codes. The Switchboard results have shown that the proposed speaker-code based adaptation methods may achieve up to25%relative error reduction using only a few dozens of adaptation utterances per speaker.Finally, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal ma-trices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying part of the singular values. After that, we use SVD to initialize speaker codes and connection weights to obtain a comparable ASR performance as in previous work but with a smaller speaker code size and much less computation complexity, which is very important for the efficiency.
Keywords/Search Tags:speech recognition, deep neural network, speaker adaptation, DNN fasttraining, speaker code, i-Vector, sequence discriminative training, singular value de-composition
PDF Full Text Request
Related items