Font Size: a A A

Robust structured voice extraction for flexible expressive resynthesis

Posted on:2008-09-26Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Jinachitra, PamornpolFull Text:PDF
GTID:2442390005956941Subject:Engineering
Abstract/Summary:
Parametric representation of audio allows for a reduction in the amount of data needed to represent the sound. If chosen carefully, these parameters can capture the expressiveness of the sound, while reflecting the production mechanism of the sound source, and thus allow for an intuitive control in order to modify the original sound in a desirable way. In order to achieve the desired parametric encoding, algorithms which can robustly identify the model parameters even from noisy recordings are needed. As a result, not only do we get an expressive and flexible coding system, we can also obtain a model-based speech enhancement that reconstructs the speech embedded in noise cleanly and free of musical noise usually associated with the filter-based approach. In this thesis, a combination of analysis algorithms to achieve automatic encoding of a human voice recorded in noise is described. The source-filter model is employed for parameterization of a speech sound, especially voiced speech, and an iterative joint estimation of the glottal source and vocal tract parameters based on Kalman filtering and the expectation-maximization algorithm is presented. In order to find the right production model for each speech segment, speech segmentation is required which is especially challenging in noise. A switching state-space model is adopted to represent the underlying speech production mechanism involving the smoothly varying hidden variables and their relationship to the observed speech. A technique called the unscented transform is incorporated in the algorithm to improve the segmentation performance in noise. In addition, during voiced periods, the choice of the glottal source model requires the detection of the glottal closure instants. A dynamic programming-based algorithm with a flexible parametric model of the source is also proposed. Each algorithm is evaluated in comparison to recently published methods from the literature. The system combination demonstrates the possibility of a parametric extraction of speech from a clean recording or a moderately noisy recording, further providing the option of modifying the reconstruction to implement various desirable effects.
Keywords/Search Tags:Sound, Speech, Flexible
Related items