Speech recognition in mobile environments | | Posted on:2001-05-27 | Degree:Ph.D | Type:Thesis | | University:Carnegie Mellon University | Candidate:Huerta, Juan M | Full Text:PDF | | GTID:2468390014452602 | Subject:Engineering | | Abstract/Summary: | PDF Full Text Request | | The growth of cellular telephony combined with recent advances in speech recognition technology results in sizeable potential opportunities for mobile speech recognition applications. Classic robustness techniques that have been previously proposed for speech recognition yield limited improvements of the degradation introduced by idiosyncrasies of the mobile networks. These sources of degradation include distortion introduced by the speech codec as well as artifacts arising from channel errors and discontinuous transmission.; In this thesis we focus on characterizing the distortion introduced to the speech signal by the speech codec and we propose methods for reducing the detrimental effect of coding on recognition accuracy. The initial focus of this thesis is on the full rate GSM codec (FR-GSM). We propose a method to generate recognition features directly from codec parameters. It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec parameters it is possible to reduce the effect of coding on recognition.; The later parts of this work are related to weighted acoustic modeling for robust speech recognition. The motivation for this approach is based on the observation that not all phones in a GSM-coded corpus are distorted to the same extent due to coding. We first establish a set of phonetic distortion classes through an analysis of the distribution of the log spectral distortion introduced to each phone by the codec. These classes are then employed to estimate an optimal weighted combination of acoustic models according to the average distortion encountered by the class. A relative reduction of almost 70% of the degradation introduced by the GSM codec was achieved using this method.; The technique of weighted acoustic modeling based on instantaneous distortion is introduced as an alternative to the method based on average distortion information. When the extent of cepstral distortion introduced by coding is known, weighted acoustic modeling provides a reduction of about 50% in the word error rate introduced by concurrent GSM and CELP. We propose two methods to estimate the instantaneous distortion information: one based on recoding sensitivity and another based on long-term predictability. Due to the non linear relation between the time and the log-spectral domain, the proposed estimates of the instantaneous distortion do not perform as well as algorithms based on knowledge of cepstral distortion. However, we show that employing the proposed instantaneous distortion information estimates can help obtain the best recognition results established in the baseline conditions employing only 50% of the baseline Gaussian density computations. | | Keywords/Search Tags: | Recognition, Mobile, GSM codec, Distortion, Weighted acoustic modeling | PDF Full Text Request | Related items |
| |
|