Font Size: a A A

A Research Of Quadruple-channel Extremely Low Bitrate Dynamic Codec For Voice

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HuoFull Text:PDF
GTID:2428330596995060Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Digital audio is widely used in daily live because it has good quality and is convenient.As the development of internet and smart phone,huge amount of digital voice data is produced and transferred through applications such as telephone,voice mail,voice translation,recording of stories,voice remote control,etc.For reducing transmitting bandwidth and storing space pressures,it is necessary to compress these voice signal before transmitting and storing.As the computers' speed improves,many modern audio codecs with good perception qualities and low storage space requirements have been developed since the 90's.Most of these codecs are designed for general purpose usage.In comparison of general audio,voice has much simpler structure,therefore has more probability to be compressed to a lower bit rate.However,very few codecs focus on making full use of these to optimize voice compression.Therefore,this paper proposed an audio codec for solo voice recordings,which aim at producing compressed voice with acceptable and natural perception quality yet with extremely low bitrate.Initially,this paper makes an overview of existing audio codecs.and analyze both their strong points and shortcomings.Then,a four-channel model is proposed to decompress voice signals,so that they can be compressed intuitively and effectively.The four channels are fundamental frequency,envelope spectrum of harmonic peak spectrum,color noise spectrum,and initial phases.These three spectra are compressed with a non-linear online dictionary learning method.By utilizing Hilbert transform,this paper proposed an efficient method that can trace the target functions' shift over the frequency,which leads to a much more flexible dictionary learning method.This novel dictionary learning method not only use weighted atoms to represent the target function,but also shift atoms over the frequency for smoother fitting the target function.This paper also optimizes the dictionary by utilizing a multi-pass scheme.With a tradeoff on the coding latency,a more optimal dictionary can provide better fitting results as well as fewer parameters for fitting,which leads to lower bit rate.Furthermore,for better atom matching efficiency,and more effectiveness atom indexing,a Least Recently Used(LRU)algorithm is used to limit the dictionary size.All floating point parameters are properly and approximately represented by integers.The precision of representation is carefully designed.The more sensitive parameters are en-coded with better accuracy,and the others are with less accuracy.For better adapt to the parameters' characteristics with lowest bitrate,some parameters are recorded as differences,while others use non-uniform coding.Finally,a dynamic Huffman entropy coding method is proposed.By adjusting the Huffman tree during coding/decoding process,this method can not only avoid explicitly transmitting the prior probability table,but also applicable for stream encoding.The experimental results show that,while keeping the coding sample rate at 24 kHz,the proposed codec can compress the voice stream to averagely 1kbit/s.The proposed codec not only applicable to general purpose voice transmission,but also can play an important role on scenarios that demand extremely narrow bandwidth,e.g.,submarine,satellite,Or emergency telephone communication.In comparison to other low bit rate codecs,the proposed codec preserves higher sample rates and thus provide better perception qualities.
Keywords/Search Tags:Low bitrate voice encoding, Incremental dictionary learning, Sparse encoding, Entropy encoding, Differentiate encoding
PDF Full Text Request
Related items