Font Size: a A A

Probabilistic models of time-domain speech signals

Posted on:2008-12-22Degree:Ph.DType:Thesis
University:University of Toronto (Canada)Candidate:Achan, KannanFull Text:PDF
GTID:2448390005976650Subject:Computer Science
Abstract/Summary:
This thesis addresses the problem of modeling speech directly in the time domain and reconstructing time-domain speech signals from phaseless feature domain representations. Processing of speech in the time domain is generally not favored because accounting for variability in phase is not straight-forward. Instead, it is common to process speech in a feature domain where the phase components have been removed. However, many applications of speech processing require that the output be in the time-domain. In this case, speech signals can be processed in a phase-free feature domain and then transformed to the time-domain by reconstructing the phase, or they can be processed directly in the time-domain. In this thesis, we study how to reconstruct time-domain speech signals from phase-free feature representations and how to model and analyze speech signals directly in the time-domain.; In the first part of this thesis, we address the problem of inverting a feature domain representation of speech to recover an estimate of the underlying time-domain speech waveform. In particular, we consider inverting spectrograms (short-time magnitude spectra), since they are among the most popular feature-domain representations of speech. A significant problem with techniques that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. We describe a probabilistic generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram. In contrast, to techniques that alternate between estimating the phase and a spectrally-consistent signal, our technique directly infers the speech signal, thus jointly optimizing the phase and the spectrally-consistent signal. We compare our technique with a standard method in terms of improvements in signal-to-noise ratios and also provide audio files for the purpose of demonstrating to the reader the improvement in perceptual quality that our technique offers.; In the second part of this thesis, we present a purely time-domain approach to speech processing which identifies waveform samples at the boundaries between glottal pulse periods (in voiced speech) or at the boundaries of unvoiced segments. An efficient algorithm for inferring these boundaries and estimating the average spectra of voiced and unvoiced regions is derived from a simple probabilistic generative model. Competitive results are presented on pitch tracking, voiced/unvoiced detection and timescale modification; all these tasks and several others can be performed using the single segmentation provided by inference in the model.; The contributions of this thesis offer a rich algorithmic framework for modeling speech signals in both time- and time-frequency domains. In the process, we also show that probabilistic generative models offer a natural way to represent, reason and learn about the underlying acoustic observations.
Keywords/Search Tags:Speech, Model, Probabilistic, Phase, Thesis, Directly
Related items