Font Size: a A A

Research On Speaker Adaptation Of Deep Neural Network

Posted on:2018-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiangFull Text:PDF
GTID:2348330563951325Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Since the deep neural network(DNN)is more and more widely used in the field of speech recognition,the DNN-based speech recognition system has become a key and hot issue.Similar to the traditional GMM-HMM model,the DNN-based framework still suffers the problem caused by the mismatch between the training and the test data,which restricts the practicability of speech recognition system.Therefore,it is another hot issue to use a small amount of adaptive data to improve the matching degree between the model and the test data.The speaker adaptation technology is used to reduce the negative impact of the mismatch problem by adjusting the parameters of neural network.This thesis discusses some reasonable and practicable methods to improve the recognition performance by making use of the speaker information and modifying the network structure.The main content is as follows:Due to the existence of the bottleneck layer,the performance of a tandem system is a little poor,even with some existing speaker adaptation methods.To solving this problem,the speaker adaptation method based on SNMF and i-vector is proposed,implementing by two different methods: in the first one,speaker aware training is used to train a common DNN with i-vectors,then one of the DNN layers is forced to become a bottleneck layer by decomposing its weight matrix using the SNMF algorithm,and finaly the bottleneck features are extracted with this model;in the second method,a speaker-independent neural network is trained,then the adaptive network is learned based on i-vector,and the parameters of speaker-independent neural network are updated,finally the SNMF algorithm is used to generate a bottleneck layer and features containing speaker information are extracted.After the speaker-dependent features are extracted,a speech recognition system is built based on it.In the speech recognition task on the WSJ corpus or the Czech corpus,this method makes the system recognition performance improve further,which proves that the algorithm shows good performance not only for the corpus with enough data but also for that with limited data.In the speaker adaptation method based on i-vectors,the i-vector is extracted by MFCCs,whose robustness is relatively poor.Therefore,a speaker adaptation method based on the modified i-vector is proposed.By introducing singular value matrix decomposition algorithm for low dimensional feature extraction,the more robust feature vector is used for the i-vector training and extracting.The experiments on the Czech corpus and the WSJ corpus show that the method is superior to the method without speaker adaptation and that with the traditional i-vector-based speaker adaptation.Moreover,the speaker embedding feature,which is good at dealing with short speech segments,is introduced as the speaker information used in speaker aware training algorithm.A speaker adaptive method based on the speaker embedding feature is proposed.In this method,first order statistics and speaker labels are used to train the speaker embedding feature extractor,and then the speaker adaptation is performed by concatenating the speaker embedding feature and the original input features.The experiments on TIMIT corpus show that this method is better than the traditional DNN speech recognition system.
Keywords/Search Tags:Deep Neural Networks, Speaker Adaptation, Matrix decomposition, i-vector, Speaker Adaptation Trainning
PDF Full Text Request
Related items