Font Size: a A A

A Study Of Speech Enhancement And Recognition Based On Microphone Array Processing

Posted on:2011-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiFull Text:PDF
GTID:2178360308967476Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Automatic speech recognition (ASR) techniques have already been capable of achieving quite high recognition rates for clean speech. Under practical application environments, however, existence of environmental noises and reverberations, accompanied by interferences from other sound sources, can cause mismatch between the speech features to be recognized and the training templates, and thus severely degrades the performance of the recognition system. This thesis concerns development of array processing methods for wideband speech signals in the context of an ASR system with a small-sized microphone array in the front end. The goal is to, through joint spatial-temporal processing, increase the probability of correct speech recognition in practical environments.On speech source localization, a wide-band direction-of-arrival (DOA) estimation method based on the ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques) algorithm is developed, and further improved via a combination with multi-channel linear prediction analysis of speech signals as well as SNR estimation. Experiments with a small-size microphone array confirm that this method can achieve a very high spatial resolution for wide-band speech signals, far more superior to conventional beamforming methods, yet without beam-scanning across the entire angular domain required by other typical high-resolution methods. Source localization results are then used to guide the subsequent array processing to extract speech signals from the specified speaker.Most of the current microphone array ASR systems comprises two independent stages-array signal processing and feature recognition. This thesis considers the processing in those two stages in a joint way:outputs of the recognition stage are fed back to the front end; array filtering coefficients are then adjusted via an optimization procedure in which the likelihood of the right transcription is maximized for a selected vocabulary. In addition, a global searching algorithm is applied to further improve the performance of this joint optimization scheme. Different from conventional array processing aiming to enhancing signal waveform, the approach here enhances speech features to better match the recognition model, thus directly increasing the likelihood probability of correct hypotheses in recognition. Experiments clearly demonstrate the performance improvement of the proposed approach.
Keywords/Search Tags:Microphone array, Speech recognition, Wideband ESPRIT algorithm, FIR filtering, Numerical optimization
PDF Full Text Request
Related items