Font Size: a A A

Matching pursuits sinusoidal speech coding

Posted on:2002-02-05Degree:Ph.DType:Thesis
University:University of California, Santa BarbaraCandidate:Etemoglu, Cagri OzgencFull Text:PDF
GTID:2468390011497261Subject:Engineering
Abstract/Summary:
Various applications require speech transmitted and stored efficiently in digital form, leading to an increase in research and development of speech compression algorithms. One of the goals of the current ongoing standardization efforts is to achieve toll quality at rates around 4 kbps.; The main objective of this dissertation is to devise novel models and techniques for speech coding in order to achieve high quality at 4 kbps or below. We address the challenges in achieving high quality through the development of a new sinusoidal speech coder that incorporates these novel models and techniques.; This dissertation introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one non-harmonically related components of the voiced segment. This approach eliminates the need for voicing information that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce Frequency Bin Vector Quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching, combined with modeling of the perceptually important phase information contribute to an improvement of speech quality in transition frames.; To develop a successful sinusoidal coder, we need to efficiently quantize the parameters of the model. Therefore the other significant task of this dissertation is parameter quantization. Since in sinusoidal coders the spectral magnitudes usually consume a large amount of available bits and their faithful reproduction is crucial for toll quality, some of the research effort is devoted to developing efficient vector quantization (VQ) techniques. Specifically, this dissertation presents a group of novel structured vector quantization (VQ) techniques characterized by the use of transformations for quantizing the input vector. The transformations are selected from a family of transformations, represented by a codebook of matrices. Experimental results based on spectral magnitude quantization show that the proposed VQ techniques outperform the multistage vector quantization (MSVQ) of same bit rate.; In this dissertation we adopted a phonetic class based parametric coding to achieve high quality coded speech at low rate. The coder incorporates multimode analysis/synthesis, novel matching pursuits based voiced and transitional speech modeling, robust parameter estimation and efficient spectral quantization techniques. The resulting 4 kbps proposed coder is called matching pursuits sinusoidal speech coder and achieves a perceptual quality slightly exceeding that of G.723.1 coder at 6.3 kbps as indicated by subjective listening tests.
Keywords/Search Tags:Speech, Sinusoidal, Matching pursuits, Quality, Coder, Vector quantization, Coding, Kbps
Related items