Font Size: a A A

Perceptual audio coding that scales to low bitrates

Posted on:2008-12-11Degree:Ph.DType:Thesis
University:New Mexico State UniversityCandidate:Kandadai, Srivatsan AFull Text:PDF
GTID:2448390005451858Subject:Engineering
Abstract/Summary:
A perceptually scalable audio coder generates a bit-stream that contains layers of audio fidelity and is encoded in such a way that adding one of these layers enhances the reconstructed audio by an amount that is just noticeable by the listener. Such algorithms have applications like music on demand at variable levels of fidelity for 3G and 4G cellular radio since these standards support operation at different bit rates. While the MPEG-4 (Motion Picture Experts Group) natural audio coder can create scalable bit streams, its perceptual quality at low bit rates is poor. On the other hand, the nonscaleable transform domain weighted interleaved vector quantization (Twin VQ) performs well at low bit rates. As part of this research, we present a technique to modify the Twin VQ algorithm such that it generates a perceptually scalable bit-stream with many fine-grained layers of audio fidelity. Using Twin VQ as our base ensures good perceptual quality at low bit rates (8 16k bits/second) unlike the bit slice arithmetic coding (BSAC) used in MPEG-4.; In this thesis, we first present the Twin VQ algorithm along with our technique of reverse engineering it. From the reverse engineered Twin VQ information, we build a scalable audio coder that performs as well as Twin VQ at low bitrates in human subjective testing. The residual signals generated by the successive quantization strategy developed here are shown to have statistical properties similar to independent Laplacian random variables, so we can therefore apply a lattice VQ that takes advantage of the spherically invariant random vectors (SIRV) generated by such random variables. In particular, the lattice VQ allows us more control over the layering of the bitstream at higher rates.; We also note that the layers of audio fidelity in the compressed representation must be stored and transmitted in a perceptually optimal fashion. To accomplish this, we make use of an objective metric that takes advantage of subjective test results and psychoacoustic principles to quantify audio quality. This objective metric is used to optimize the ordering of audio fidelity layers to provide a perceptually seamless transition from lower to higher bit rates.
Keywords/Search Tags:Audio, Bit, Rates, Perceptual, Low, Twin VQ, Layers, Scalable
Related items