Font Size: a A A

Research On Out-of-vocabulary Spoken Term Detection

Posted on:2015-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:S F XiongFull Text:PDF
GTID:2268330431450135Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Spoken term detection (STD) is a significant research task for multimedia information retrieval. The current mainstream STD algorithm is based on large vocabulary continuous speech recognition (LVCSR). Out-of-vocabulary (00V) query terms are words that have a high probability to be a part of the user’s search queries but not in the recognition vocabulary. Compared with in--vocabulary STD, the performance deteriorates greatly in OOV detection, which is now still a major challenge faced by an STD system. The main problems of OOV STD are intrinsic uncertainty in pronunciations, significant diversity in term properties and a high degree of weakness in acoustic and language modeling.In this dissertation, in order to tackle the OOV issue and improve the STD performance on OOV terms, we first focus on building a high performance sub-word speech recognizer. For this purpose, we use a variety of training algorithms to improve the speech recognition accuracy. The discriminative training based on minimum phone error (MPE) criterion is adopted in GMM-HMM acoustics modeling. Furthermore, deep neural network (DNN) is adopted to replace the GMM, and cross-lingual training and rectified linear units (ReLUs) activation function are used to improve performance of DNN acoustic models. To tackle the linguistics information absence in source deficiency language ASR problem, the automatic generation question set is used. Three different kinds of sub-word units, phoneme, syllable and fragment, are adopted as the decoding units. This reduces the degree of weakness in OOV language modeling and improves the phone recognition accuracy of OOV words. Second, in the procedure of building OOV STD systems, considering the different properties of the three sub-word units, we use different search algorithms in this dissertation. Weighted finite state transducer based complete match search is applied in phone-based STD system to reduce miss probability; and fuzzy search is applied in syllable-based and fragment-based STD systems to reduce false alarm probability. To deal with the diversity in term properties, we use a term-dependent score normalization method. In addition, considering the complementarities among different ASR systems, we propose a STD system combination strategy based on linear logistic regression to further improve the reliability of confidence.With these proposed methods, we carried experiments on NIST STD2006English data set and NIST OpenKWS2013Vietnamese data set, and significant improvements are achieved in the STD performance on OOV terms.
Keywords/Search Tags:Spoken term detection, Speech Recognition, Deep Neural Network, Out-of-vocabulary, System Combination, Confidence
PDF Full Text Request
Related items