Font Size: a A A

Research On Multi Band Excityation And Linear Predictive Low-Rate Speech Coding

Posted on:2012-11-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X LiangFull Text:PDF
GTID:1118330338450116Subject:Military communications science
Abstract/Summary:PDF Full Text Request
With the rapid development of wireless communication, subscribers'demand increases fast. This leads to less and less available spectrum resources. An important way to solve above problem is to improve the spectrum efficiency. Speech communication, with objectives of low bit-rate and high quality, is a fundamental service in wireless communication. Multi-Band Excitation (MBE) algorithm is one of the typical representatives of low-rate and high-quality speech coding. Since it is a parameter coding method, the rate is lower than that of waveform coding. While the fine sub-band segmentation results in accurately Unvoiced/Voiced (U/V) decision, which follows high-quality synthetic speech.The work of this dissertation consists of investigations on Multimode MBE with Linear Prediction and improvements of vector quantization. The main contributions are as follows.1. Multimode MBE with Linear Prediction vocoder is proposed. Compared with available MBE coding, this vocoder has improvements at two points. Firstly, it sovles the problem of variable dimension vector quantization of spectral amplitudes.The variable-dimension spectral amplitudes are converted to fix-dimension Linear Predictive coefficients, which are denoted and quantized by Line Spectral Frequency (LSF) parameters, so that the precision of quantization is guaranteed. Secondly, a novel sub-bands segmentation method and a U/V decision threshold are designed. The number of sub-bands in one frame is fixed and U/V decisions are made in each sub-band, followed by the categration of modes. Due to the different statistic distributions in different modes, the LSF vectors are quantized by different codebooks to improve the quality of quantization. Moreover, an adaptive threshold related to energy, which is much simpler than MBE, is exploited for U/V decision. The simulation results indicate that the unvoiced and voiced parts of synthetic speech are distinguishable and consistent with the original speech, and the spectrum of synthetic speech well fits the original one.2. A Moving Average Multi-Stage Split Vector Quantizer (MA-MS-SVQ) is developed to quantize LSF parameters. Inter and intra frame correlations of LSF coefficients are fully utilized, and the storage and searching complexity of codebooks are reduced. LSF coefficients, the average of which is removed, are predicted by a first order moving average predictor. Then the residual LSF coefficients are quantized by a three-stage vector quantizer. In the second stage, each high-dimensional LSF coefficients vector is split into two low-dimensional parts, which are quantized respectively. Simulation results demonstrate that, under the precondition of low bit rate speech coding, the average spectral distortion of synthetic speech is 0.91 dB, the percentage of outliers between 2dB and 4dB is 0.13%and there is no outliers larger than 4 dB. Moreover, both storage and searching complexity of codebooks are reduced by more than 31%.3. A Most Dispersed and Greedy Tree Growing Algorithm (MD-GTGA) is proposed to generate Linde-Buzo-Gray (LBG) algorithm initial codebook. MD-GTGA overcomes the disadvantage of LBG algorithm that it is likely to be trapped into local optimizers. In this algorithm, a fundamental codebook was generated by Greedy Tree Growing Algorithm (GTGA), from which an initial codebook was obtained by Most Dispersed Codewords in Initialization (MDCI) algorithm. Random, Split, GTGA and MDCI algorithms are compared by simulation. The results show that the Average Spectral Distortion (ASD) of synthetic speech by GTGA is the smallest. Compared with GTGA and MDCI algorithm, both the average distortion and ASD are reduced by MD-GTGA.4. An Improved Pairwise Nearest Neighbors (IPNN) algorithm is constructed to generate the LBG initial codebook. In this algorithm, the preparative codebook is chosen randomly or by Split algorithm, and then the training vectors are merged one by one, by PNN algorithm, into a nearest code in preparative codebook. Simulation results indicate the decrease of the run time of the proposed algorithm, compared with PNN algorithm. It is also shown that, the final codebook obtained from Split preparative codebook is more stable than that from random method. Test is performed on IPNN algorithm, which shows that the average spectral distortion of syhthetic speech is around 1 dB, the percentage of outliers between 2dB and 4dB is less than 2%, and there is no outlier larger than 4dB.
Keywords/Search Tags:speech coding, vector quantization, initial codebook, Pairwise Nearest Neighbors algorithm, Greedy Tree Growing Algorithm, Most Dispersed Codewords in Initialization algorithm, Multimode Multi-Band Excitation, Linear Prediction
PDF Full Text Request
Related items