Font Size: a A A

A formant-based linear prediction speech synthesis/analysis system

Posted on:1996-10-19Degree:Ph.DType:Thesis
University:University of FloridaCandidate:Shue, Yean-JenFull Text:PDF
GTID:2468390014487895Subject:Engineering
Abstract/Summary:
The aim of this research was to develop a speech synthesis/analysis system as the framework for generating high-fidelity synthetic speech and for psychoacoustic studies. A formant-based linear prediction (LP) synthesizer, along with a robust speech analysis procedure, was developed to achieve this aim. The major feature of this system is its ability to adapt the formant and linear prediction schemes to represent the voiced and unvoiced sounds, respectively. The advantages of employing two kinds of schemes in one synthesis system are (1) the formant scheme is physically meaningful for simulating the human speech production system, and (2) the LP scheme is able to reproduce the spectrum of all speech sounds.; The formant-based LP synthesizer uses two types of sources, voiced and unvoiced, to form the excitation part of the synthesizer. These sources are either nonparametric waveforms or parametric models of waveforms. The vocal tract is characterized by a twelfth order linear prediction filter. For voiced sounds, the coefficients of the vocal tract filter are determined by the first six formants. The counterparts for unvoiced sounds are obtained by means of a twelfth order LP analysis. This synthesizer can resynthesize speech almost perfectly when the estimated glottal waveform from a glottal inverse filtering process is used as the excitation source. When the modeled waveform is used as the excitation source, the synthesized speech is natural and intelligible.; The other feature of this research is that the interaction between the synthesis and analysis is closely defined. A two-phase, LP-based procedure that analyzes a segment of the speech signal was developed to estimate the time-varying synthesis parameters such as the voiced/unvoiced classification, fundamental frequency, signal power, formants (for voiced sounds), LP coefficients (for unvoiced sounds), and the estimated glottal waveforms to the formant-based LP synthesizer.; Based on the synthesis and analysis procedures, as well as a knowledge of the relationships between vocal quality and glottal features, a voice conversion procedure that reproduces the vocal tract component, but varies the glottal features, was developed to convert a segment of the speech signal of modal voice type to five other voice types (vocal fry, breathy, falsetto, whisper, and harsh). The conversion procedure provides a systematic method for examining the relationships between vocal quality and glottal features, and can be used to build a data base for various voice types, which can be used in training a speech recognition system.; In addition to the glottal source parameters, the vocal tract parameters can be manipulated by our synthesis/analysis system as well. Since the features of the glottal source and the vocal tract are both involved in speech studies such as gender conversion, speaker identification, and speech recognition, this synthesis/analysis system can serve as a tool for future applications.
Keywords/Search Tags:Speech, System, Linear prediction, Formant-based, Vocal tract, Glottal
Related items