Font Size: a A A

Toward a high-quality singing synthesizer with vocal texture control

Posted on:2003-03-25Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Lu, Hui-LingFull Text:PDF
GTID:2465390011488599Subject:Engineering
Abstract/Summary:
Spectral modeling and physical modeling have been used in the past to achieve high-quality singing synthesis. However, spectral models are known to be difficult to articulate and are therefore relatively limited in expressivity. On the other hand, it is not straight forward to adjust physical model parameters to reproduce a specific recording. In this thesis, a high-quality singing synthesizer is proposed with an associated analysis procedure to retrieve the model parameters automatically from the desired voices. Since over 90% of singing is voiced sound, the focus of this research is to improve naturalness of the vowel tone quality. In addition, an intuitive parametric model is developed to control the vocal textures of the synthetic voices ranging from “pressed,” to “normal,” to “breathy” phonation.; To trade-off between complexity of the model and the corresponding analysis procedure, a source-filter synthesis model is proposed. Based on a simplified human voice production system, the source-filter synthesis model describes human voices as the output of the vocal tract filter excited by a glottal excitation. The vocal tract is modeled as an all-pole filter which has often been used in the past to model non-nasal voiced sound. To accommodate variations in vocal textures, the glottal excitation model employs two elements: a parametric derivative-glottal-wave and modulated aspiration noise. The derivative glottal wave is given by the transformed Liljencrants-Fant (LF) model. The aspiration noise is represented as pitch-synchronous, amplitude-modulated Gaussian noise.; A major contribution of this thesis is the development of an analysis procedure that estimates the parameters of the proposed synthesis model to mimic desired voices. First, a source-filter deconvolution algorithm based on convex optimization techniques is proposed to estimate the vocal tract filter from sound recordings. Second, the inverse-filtered glottal excitation is decomposed into a smoothed derivative glottal wave and a noise residual component using Wavelet Packet Analysis, through which proper parameterizations of the glottal excitation can then be found. Finally, baritone recordings are analyzed to construct a parametric model for controlling vocal textures in synthesized singing.
Keywords/Search Tags:Singing, Model, Vocal, Glottal excitation, Synthesis
Related items