Font Size: a A A

A Study Of An Irrelevant Variability Normalization Based Large Vocabulary Continuous Speech Recognition

Posted on:2013-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2268330392967083Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the explosion of information in modern society, recording device is not onlyway to produce speech data. Telephone, mobile, and even Voice Search, all such kindsof media provide a convenient way for the researcher in speech area to obtain usefulcorpus. Therefore, Large Vocabulary Continuous Speech Recognition has been the keyproblem of automatic speech recognition. Currently, the new generation researcher ismore convenient to access various the data they wanted. However, it also presents somespecial challenges: how to handle the massive data and how to deal with the side effectto the traditional model because of the various kind of variability (e.g., speakers, envi?ronments, channels) which may be presented in the data. For example, an applicationwhich works well for some class of users may not be functional for others, and a sys?tem that can be used by many in a quiet room may break down in harsh environments.In another word, traditional model training procedures may lead to a set of diffusedmodels iftting the variability irrelevant to phonetic classiifcation. Irrelevant variabilitynormalization (IVN) aims at solving this problem. It removes the irrelevant informationthrough trai“"ning a set of transformation and thus to simulate the simple condition foreach cluster. Current, IVN has been successful in small corpus and also has some ex?periments for large vocabulary continuous speech recognition (LVCSR). In this paper,we propose new method for improving the current framework of IVN through dimen?sion reduction, acoustic snififng and also combining some state-of-art method to buildthe whole system for LVCSR. We perform the speech recognition experiments in the300hrs telephone recording corpus and the7500hrs Voice Search corpus.Compared with the conventional methods, the proposed discriminative based IVN training and the i-vector based acoustic sniffing are able to catch more useful informa-tion through removing the irrelevant noisy. Besides, our new i-vector method is more robust and efficient than the traditional acoustic sniffing algorithm.Experimental results demonstrate that the proposed discriminative based IVN training and the i-vector based acoustic sniffing can improve the relative recognition ac-curacy more than20%in the swb corpus. Moreover, our new acoustic sniffing method could be applied to the7500hrs Voice Search task which is much more efficient than the traditional GMM-based approach.
Keywords/Search Tags:Large Vocabulary Continuous Speech Recognition, Irrelevant Variability Normalization, Acoustic Sniffing, i-vector, Speakerrecognition
PDF Full Text Request
Related items