Font Size: a A A

Embedding perceptual linear prediction models in speech and audio coding

Posted on:2007-09-28Degree:Ph.DType:Thesis
University:Arizona State UniversityCandidate:Atti, Venkatraman SFull Text:PDF
GTID:2448390005978925Subject:Engineering
Abstract/Summary:
The application of perceptual models in speech and audio coding began receiving attention during the late nineteen seventies. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on relatively old concepts introduced by Schroeder and Atal in 1979. This dissertation studies a series of problems encountered in the application of new perceptual models in prediction-based speech and audio coding algorithms. The dissertation also explores different ways of integrating advanced human auditory models into fixed and variable bit rate standardized vocoders.;Specific problems addressed in this dissertation include: (1) the significance of auditory excitation patterns in speech analysis and synthesis, (2) the performance of perceptual loudness metric in variable bit rate speech coding, and (3) adaptive pole estimation algorithms for linear prediction (LP) analysis in cascade form. The investigation of the above problems resulted in the development of two new algorithms for use in wideband speech coding and in variable bit rate speech coders.;The first one is called the perceptually-motivated all-pole (PMAP) modeling algorithm. The PMAP algorithm is based on an approach for estimating the perceptually-relevant pole locations. The estimated perceptual poles are used to construct an all-pole filter for use in speech analysis. The proposed PMAP approach is compared against some of the existing perceptually-based linear prediction methods, i.e., the perceptual LP and the Warped LP. The PMAP modeling (1) provides a way to integrate psychoacoustic principles into LP by using auditory excitation pattern (AEP) matching, (2) enables estimation and perceptual ranking of the speech formants, and (3) provides an LP prediction residual with lower perceptual loudness. The computational profiling of the PMAP algorithm highlighted the modules that are computationally complex. In particular, the AEP-matching search contributed to the majority of the computational complexity. A fast-PMAP modeling that employs a block-form of AEP-matching was developed. By making use of the properties of the parametric spreading function and its energy-preserving smearing operation, the AEPs are recursively estimated. This recursive estimation of excitation patterns resulted in significant (over 50%) computational reduction. Experiments that compare the performance of the fast-PMAP algorithm relative to the original PMAP algorithm are included.;The second algorithm is called the perceptual-loudness (PL) based rate determination. Unlike the existing rate selection strategies that are based on a voice activity detector and energy thresholds, the proposed method employs a perceptual loudness measure. The enhanced variable rate codec is used as the test-bed for evaluating the performance of the PL-based rate selection strategy. Experimental results demonstrate that the proposed PL-based rate determination compares well against other energy-based rate selection techniques in terms of average bit rate and speech quality. A fast PL-based rate selection algorithm that employs an LP analysis-driven pre-filtering followed by partial loudness estimation is proposed.
Keywords/Search Tags:Speech, Perceptual, Coding, Models, Rate, Linear prediction, PMAP algorithm, Estimation
Related items