Font Size: a A A

Research On Chinese Spoken Term Detection Based On Deep Learning

Posted on:2016-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:C S WangFull Text:PDF
GTID:2308330479490040Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Spoken term detection(STD) is a task to automatically detect a set of keywords in continuous speech. One mainstream method of STD is based on LVCSR(Large Vocabulary Continuous Speech Recognition). LVCSR-based STD usually uses a two-stage model e.g. recognition stage and detection stage. The performance of speech recognition has a great influence on STD.Traditional spoken term detection system always use GMM-HMM model as the acoustic model of LVCSR, which consists of GMM(Gaussian Mixture Model) and HMM(Hidden Markov Model). However, as the great impact of deep learning on speech recognition, people begin to replace GMM with DNN(Deep Neural Network) as the acoustic model and results show that DNN-HMM greatly improves speech recognition accuracy, compared to GMM-HMM model. Therefore, we use DNN-HMM model as the acoustic model of LVCSR in STD and use it to establish our STD system. Our experimental results show that, compared to GMM-HMM acoustic model, our DNN-HMM acoustic model not only has better recognition accuracy, but also the performance of STD has greatly improved.For the two stages of LVCSR-based STD lack close contact, we study giving keywords greater weight during the training of acoustic model to improve the capabilities of modeling keywords. Discriminative training of speech recognition considers the model training and speech recognition results together, then establishes the objection function and optimizes the parameters according to the objection function. Therefore, we consider using discriminative training methods to establish contact between the acoustic model and spoken term detection. The basic idea is that we can put more weights on keywords during discriminative training, which is called discriminative training based on non-uniform criteria. We consider non-uniform MCE(Minimum Classification Error) criteria, which puts more weights on keywords during traditional MCE training. After defining nonuniform MCE objective function, we use it to optimize the parameters of acoustic model. Experimental shows that non-uniform MCE method can improve the performance of our STD system. During non-uniform MCE training, if we imposethe same weight during the different optimization iterations, it could lead to severe over-training when we use fairly large weights. Therefore, we use the adaptive boosting(Ada Boost) technique to adjust the weights dynamically during training. It can solve the over-training problem during non-uniform MCE training. Experimental results show that non-uniform MCE training based on Ada Boost can give better performance. In addition, we also study non-uniform s MBR(statelevel Minimum Bayes Risk) criteria. Non-uniform s MBR training can also promote the performance. Finally, we summarize the two non-uniform discriminative training methods and compare them.
Keywords/Search Tags:speech recognition, spoken term detection, deep learning, discriminative training, minimum classification error
PDF Full Text Request
Related items