Font Size: a A A

Research On Problems In Spoken Language Identification With Short-Duration Segments

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:M G WangFull Text:PDF
GTID:2268330431950127Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the evolution in language recognition technology, the performances of most current language recognition systems have met the requirement of practical deployment if the duration of test utterances is longer than30seconds. However, in some urgent circumstances, the demand of a30s test utterance is not acceptable to many users. And if the duration of test utterances reduce to10s or less, even the most state-of-the-art system cannot give out a considerably good performance. In this dissertation, the focus is on the performance degradation of language recognition systems in the short-term test condition and several methods will be proposed here to address this problem.By analyzing the reasons why short-duration utterances are easily got effected by noises and hard to be represented accurately, we replaced the conventional statistical modeling with an exemplar-based method to overcome the speech scarcity in the short-term segments. With the construction of templates and the utilization of an encoding phrase, features which are believed to be more robust to the variabilities introduced by shortened duration are generated. Experiments show that the method we proposed is very effective on the enhancement of system performance in short-term test condition.To tackle the problem of how to acquire enough language dependent information from a short-term speech segment, a classifier based on deep neural networks is proposed in this dissertation. With the help of multiple layers of nonlinear mapping in the deep neural network, high-level representations of the short-duration samples, which are more abstract and discriminative, can be obtained to guarantee a better classification result. Moreover, a dropout finetuning scheme is introduced to help avoiding overfitting in the training process of the deep neural net classifier. Finally, with a hierarchical selection of the training samples in terms of utterance durations, the application of deep neural network classifier significantly improves the performance on short-term test of a language recognition system.To further exploit the powerful representational ability of a deep neural network, a bottleneck feature extractor is proposed in this dissertation to extract features with more condensed language dependent information and discriminability from the highly correlated speech representation of a short-duration utterance. By replacing SDC with the newly acquired bottleneck features, the language recognition system is enabled to perform with huge improvements on short-term test condition. Furthermore, we expanded the bottleneck features in a shifted delta trend and then nonlinearly reduced their dimensions with a deep auto-encoder. Experiment result shows that after the shifted delta expansion and nonlinear dimension reduction, features extracted by deep neural networks can further improve the performance of a language recognition system when the duration of the test utterances is reduced.
Keywords/Search Tags:language recognition, short-term test, encoding, deep neural networks, bottleneck features, deep auto-encoders
PDF Full Text Request
Related items