Font Size: a A A

Research On Spoken Term Detection Based On ASR Under Limited-resource Conditions

Posted on:2017-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:S L YuanFull Text:PDF
GTID:2308330485951800Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Spoken term detection (STD) is mainly based on the recognized transcriptions of an automatic speech recognizer (ASR). In recent years many research groups started focusing on STD problems under limited-resource conditions. In this dissertation, we investigate some technologies on STD under limited-resource conditions. These technologies can be used for improving the performance of ASR system, reducing missed detection in STD and enhancing confidence measure.Since the ASR performance plays a vital role in STD, we use a variety of algorithms to improve the recognition accuracy of ASR. The acoustic model is trained based on deep neural networks (DNN), and phonetic questions which are automatic generated based on data driven are adopted for tying the tri-phone states. To address the issue of data deficiencies, the DNN models of other languages are employed as the initial networks of the objective DNN model. Data Augmentation includes vocal tract length perturbation (VTLP) and adding artificial noises are used to add replicas of the training samples. Furthermore, sequence-discriminative training criteria as maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR) are adopted to improve the performance of the DNN.The performance of an STD system will be measured as a function of missed detection and false alarms. In practice, an STD application cares more about missed detection than false alarms. Thus, it is of great importance for an STD system to decrease the missed detection. In our work, we smooth the posteriorgrams of DNN output layer to reduce missed detection rate. Two smooth methods, the linear and nonlinear approaches are adopted, and these two approaches can both reduce the missed detection of keywords greatly. What’s more, term-dependent normalisation technique is used to control the false alarms.Since the ASR subsystem will inevitably make some recognition errors during decoding process, confidence measure should be used for judging whether a candidate is a hit or a false alarm. In most cases, confidence measure of single system is unreliable. To enhance the confidence measure, system combination includes different retrieval units and different systems are adopted for STD. In our work, two retrieval units, CN and FST are used in STD. In addition, we combined DNN-HMM and BN-GMM-HMM systems with different decoding units to improve the STD performance.Experimental results on the Tibetan corpus and NIST OpenKWS2014 Tamil development set have shown the effectiveness of the proposed method in our dissertation.
Keywords/Search Tags:Limited-resource, Spoken Term Detection, Speech Recognition, Deep Neural Network, Smooth Method, System Combination
PDF Full Text Request
Related items