Research On Out-of-vocabulary Spoken Term Detection

Posted on:2015-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:S F Xiong

Full Text:PDF

GTID:2268330431450135

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Spoken term detection (STD) is a significant research task for multimedia information retrieval. The current mainstream STD algorithm is based on large vocabulary continuous speech recognition (LVCSR). Out-of-vocabulary (00V) query terms are words that have a high probability to be a part of the user’s search queries but not in the recognition vocabulary. Compared with in--vocabulary STD, the performance deteriorates greatly in OOV detection, which is now still a major challenge faced by an STD system. The main problems of OOV STD are intrinsic uncertainty in pronunciations, significant diversity in term properties and a high degree of weakness in acoustic and language modeling.In this dissertation, in order to tackle the OOV issue and improve the STD performance on OOV terms, we first focus on building a high performance sub-word speech recognizer. For this purpose, we use a variety of training algorithms to improve the speech recognition accuracy. The discriminative training based on minimum phone error (MPE) criterion is adopted in GMM-HMM acoustics modeling. Furthermore, deep neural network (DNN) is adopted to replace the GMM, and cross-lingual training and rectified linear units (ReLUs) activation function are used to improve performance of DNN acoustic models. To tackle the linguistics information absence in source deficiency language ASR problem, the automatic generation question set is used. Three different kinds of sub-word units, phoneme, syllable and fragment, are adopted as the decoding units. This reduces the degree of weakness in OOV language modeling and improves the phone recognition accuracy of OOV words. Second, in the procedure of building OOV STD systems, considering the different properties of the three sub-word units, we use different search algorithms in this dissertation. Weighted finite state transducer based complete match search is applied in phone-based STD system to reduce miss probability; and fuzzy search is applied in syllable-based and fragment-based STD systems to reduce false alarm probability. To deal with the diversity in term properties, we use a term-dependent score normalization method. In addition, considering the complementarities among different ASR systems, we propose a STD system combination strategy based on linear logistic regression to further improve the reliability of confidence.With these proposed methods, we carried experiments on NIST STD2006English data set and NIST OpenKWS2013Vietnamese data set, and significant improvements are achieved in the STD performance on OOV terms.

Keywords/Search Tags:

Spoken term detection, Speech Recognition, Deep Neural Network, Out-of-vocabulary, System Combination, Confidence

PDF Full Text Request

Related items

1	Research On Spoken Term Detection Based On ASR Under Limited-resource Conditions
2	Deep Learning For Spoken Term Detection
3	Research On Confidence Measure For Chinese Spoken Term Detection
4	Research On System Combination For Spoken Term Detection
5	Research On Confidence Measure In Speech Keyword Spotting
6	Research On Speech Keyword Spotting Technology Based On Deep Learning
7	Research On Chinese Spoken Term Detection Based On Deep Learning
8	Research On WFST Based Spoken Term Detection
9	A Study Of Key Problems In Spoken Term Detection
10	Keyword Spotting Based On Sub-word Decoding And System Combination