Font Size: a A A

Robust Speech Recognition Based On Local Time-frequency Analysis

Posted on:2004-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2178360182983708Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Human has been interested in speech processing for more than two hundred years.With the development of computing, great progresses were made in speech processingtechniques in 1980s and 1990s. And then appeared the state-of-the-art automaticspeech recognition (ASR) systems, which had a good performance for quiethigh-quality speeches. But at the same time, the practical problems arised. Asenvironment noise occurred spontaneously with a certain environment, the noiserobustness of the ASR systems became one of the most emergent problems.Noise robustness is a key problem of statistical signal processing as well as ASR.The statistical techniques in signal processing, such as Wiener filtering, Kalmanfiltering, MMSE, and so on, can consequently applied to enhance speech signal aswell for ASR. Spectral subtraction, as a statistical technique that assumes the noise isslowly changing, is now widely employed by many researchers. To model noise morespecifically, the parallel model combination (PMC) technique employed a HMM(often of one emitting state with multi-Gausian-mixtures) to describe the additivenoise. Adaptive training technique, which is originally designed to speaker adaptation,can also be applied to environment adaptation.However, above techniques have the same disadvantage of noise type sensitivity.In the statistical techniques, noise is assumed to follow a given statistical property, forexample, noise is assumed stationary in Wiener filtering, and slowly changing inspectral subtraction. In PMC, good performance can be achieved only if the real noisefits the priori trained HMM. Similarly, expected effects may not obtained withadaptation technique unless the noise property in test coincides with that in adaptivetraining. To avoid assumptions for noise, Cooke and his colleagues proposed MissingData techniques (MD), of which the key idea was to depress the unreliabletime-frequency areas in likelihood calculation. In MD, the reliability was howeverdetermined by the estimation of local signal to noise ratio (SNR), of which theestimate was not easier than noise estimate. With the motivation of the idea of MD,we extended MD and give a common means to apply MD and PMC, that is tointroduce noise model or local SNR parameters into the likelihood function. It can beobserved that the time-frequency local with high power spectral density is lesscorrupted in noise. As a consequence, we can depress the time-frequency local withlow power spectral density to improve system robustness, and this is the key idea ofour Key Data techniques (KD) in this paper.In KD, large area local is analyzed. More elaborately, the harmonic structure ofthe speech signal can be utilized. Speech, especially voiced speech, shows a clearharmonic structure, which is often undestroyed in noise (Human will also fail inunderstanding speech if the harmonic structure is destroyed). Hence we also givesome research on robust ASR based on harmonic model.Each MD and harmonic based method has a good performance in improving thenoise robustness of ASR systems, and the combination of the two methods and evenother methods is under consideration in our future work.
Keywords/Search Tags:speech recognition, noise, robustness, HMM, MFCC, time-frequency local, harmonic
PDF Full Text Request
Related items