Font Size: a A A

Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition

Posted on:2006-11-19Degree:Ph.DType:Thesis
University:University of Colorado at BoulderCandidate:Zhang, XianxianFull Text:PDF
GTID:2458390005498669Subject:Engineering
Abstract/Summary:
The topic of capturing clean and distortion-free speech under distant talker conditions in noisy car environments has attracted much attention. This thesis focuses on in-vehicle robust speech processing. The first thesis goal addresses the problem of beamforming for in-vehicle speech enhancement and robust speech recognition. An analysis of driver head/body movement during voice interaction motivates the development of two novel beamforming algorithms: (i) a constrained switched adaptive beamforming algorithm (CSA-BF); and (ii) a combined fixed/adaptive beamforming (CFA-BF) algorithms. We investigate the performance of both methods with a comparison to classic delay-and-sum beamforming (DASB) in realistic car conditions using a corpus of data recorded in various car noise environments from across the United States. The second thesis goal addresses the problem of speaker tracking and localization based on an integrated audio-visual framework. This robust audio-visual integration system is effective for source tracking and speech enhancement for an in-vehicle speech dialog system. The third thesis goal addresses the concept of leveraging the strengths of array processing in suppressing directional noise, with that seen in single-channel methods that include speech spectral constraints or psychoacoustically motivated processing. The fourth thesis goal addresses the problem of in-set/out-of-set speaker recognition, where we identify a speaker as belonging to a group of predefined speakers. An effective algorithm is developed which employs spectral-based features within a Gaussian Mixture Model-Universal Background Model (GMM-UBM) framework, enhanced by discriminative speech frame selection (DSFS). The working scheme of DSFS consists of two steps: speech frame analysis and discriminative frame selection. Compared with traditional GMM speaker identification, the DSFS is able to select only discriminative speech frames, and therefore focus on only discriminative features. The fifth thesis goal considers system applications. We demonstrate that the proposed speech enhancement algorithms are effective for hearing impaired people in noisy in-vehicle environments, and that an in-set/out-of-set speaker identification system is effective for in-vehicle speaker identification. Collectively, the advances made in this thesis contribute significantly to the robustness of interactive speech systems for speech recognition, speech enhancement, and in-set speaker identification for in-vehicle environments.
Keywords/Search Tags:Speech, Speaker, In-vehicle, Robust, Frame selection, Thesis goal addresses the problem, Environments, Processing
Related items