Whisper speech processing: Analysis, modeling, and detection with applications to keyword spotting | | Posted on:2013-04-25 | Degree:Ph.D | Type:Dissertation | | University:The University of Texas at Dallas | Candidate:Zhang, Chi | Full Text:PDF | | GTID:1458390008483931 | Subject:Engineering | | Abstract/Summary: | PDF Full Text Request | | Human-to-human and human-to-machine based speech interaction can occur with different levels of vocal effort depending on speaking style, emotion, and need for privacy. In general, the speech signal can be grouped into five categories: whisper, soft, neutral, loud and shouted based on vocal efforts. Current speech processing systems are designed to deal with neutral speech. However, whispered speech has a profound negative impact on performance of speech processing systems due to fundamental differences in the production mechanism. This dissertation focuses on identification of whispered speech embedded within a neutral speech audio stream, followed by keyword spotting for the detected whispered segments. To conduct analysis and algorithm development for this study, three corpora are developed. This dissertation then proposes several novel algorithms to address detection of whisper-islands within an audio stream in both close-talk and distant speech capture environments. These algorithms are formulated following a two-stage framework which detects the vocal effort change points in the first step, followed by classification of vocal effort of the segments obtained from detected change points. Vocal effort classification algorithms are developed in both model-base and model-less frameworks. To cope with the degradation of the whisper-island detection caused by environmental effects, two algorithms are developed. The first employs a microphone array enhancement technique. The second algorithm addresses distant whisper-island detection using model information for both discriminant feature development and vocal effort classification.;The second goal of this dissertation addresses the problem of improving keyword spotting performance for embedded whispered speech. The proposed algorithm employs a vector Taylor series expansion to estimate the feature vectors for whispered speech, which is adapted to the neutral speech of corresponding content. The maximum a posterior (MAP) and maximum likelihood linear regression (MLLR) are both used to adapt the acoustic model trained with neutral speech. After deploying the proposed advancements, keyword spotting performance for whisper is improved, thus supporting the concept that a vector Taylor series expansion is helpful for speech recognition application using speech under whisper. This dissertation has therefore contributed to proposing several advanced algorithms for whisper-island detection, as well as keyword spotting enhancement for whisper. Such advancements will ultimately contribute to improved robust speech processing and recognition solutions in real environments. | | Keywords/Search Tags: | Speech, Whisper, Keyword spotting, Vocal effort, Detection | PDF Full Text Request | Related items |
| |
|