Font Size: a A A

Research On Key Technologies Of Embedded Human-Machine Speech Interaction System

Posted on:2015-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G WangFull Text:PDF
GTID:1268330428999927Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As is known to all, speech is the most natural and convenient way for human communication. It is also one of the most powerful modes for human-machine interaction, which is believed to play a core role in the next generation of human-machine interaction. Thanks to popularity of embedded mobile devices such as smartphone and tablet, along with right time for speech technologies and applications, voice interaction is coming into tens of thousands of people’s life all over the world. However, practicability of embedded human-machine voice interaction system still encounters a lot of challenges, such as power consumption of embedded equipment, limitation of computing resources, complex environment for speech recognition, etc. In this context, this dissertation focuses on embedded and interactive human-machine speech system. It provides a systematic and in-depth research on general key technologies with respect to the system and introduces our innovations in the following three aspects.First, aiming at robustness issue of speech recognition in the interaction system, this dissertation proposes a model compensation algorithm which takes both additive noise and channel distortions into account. The algorithm first estimates additive noise by using non-speech part in the sentence, following by estimates the channel function using traditional EM algorithm. After that it can perform joint compensation for the mismatched acoustic model, in the cepstral domain. The proposed method achieves significant improvement in speech recognition performance for testing under both noisy environment and scenes with channel distortions. Furthermore, the proposed method can dynamically track variation in the environment, which leads to better experiences than traditional robust algorithms.Secondly, the dissertation proposes a novel decoding algorithm based on adjustment of language model, to meet the needs of conducting medium vocabulary continuous speech recognition on embedded equipment with limited resources. The proposed decoding algorithm employs a simplified search mode based on single-tree-style dictionary, instead of traditional search algorithm based on copy-tree-style dictionary, which leads to exponential growth of search space when the size of dictionary goes up. Furthermore, to recover search errors caused by the single-tree-style dictionary, corresponding information in each node of tree-style dictionary will be justified and updated based on language model score. The complexity of proposed method decreases by an order of magnitude with little loss of recognition accuracy. After that, the dissertation also proposes a phonetic clustering based confidence measure algorithm, for the backend module of embedded speech recognition. It generates a more compact phonetic sub-space based on phonetic clustering using KL divergence. The phonetic sub-space can be used to estimate the normalization term of confidence measure score more efficiently and accurately. The proposed method also achieves significant reduction in computational complexity, with little loss in performance of confidence measure.Lastly, the dissertation presents a systematical solution for speech retrieval in fuzzy mode, to satisfy demands of speech query among more than ten million entries. The proposed solution refines and integrates various core algorithms such as second-level reverse indexing, block based dynamic programming and re-ranking in speech recognition, etc. By introducing the total solution in practical embedded system, users can obtain expected result more causally by inputting in various modes such as segment-of-text item, abbreviation of item, and various combinations of them. The proposed solution can support users’ free voice input and achieve high retrieval performance. Finally it significantly improves user experience for human-machine interaction.
Keywords/Search Tags:human-machine interaction, embedded system, robustness ofnoise, decoding of speech recognition, confidence measure, speechinformation retrieval
PDF Full Text Request
Related items