Font Size: a A A

Research On Low Bit Rate Speech Coding Based On Perception

Posted on:2017-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J HeFull Text:PDF
GTID:1108330503485222Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The universal strategy of low bit rate speech coding technology is to detect speech and non-speech and encode them with designed codebook. The existing low bit rate speech coding algorithms place much emphasis on exploiting the redundancy of time and spatial, consideration of the speech characteristic of human perception is ignored in the process of detection and compression. At present, the low bit rate speech coders in the rate of 600bps~2.4kbps give good performance in high signal to Noise Ratio(SNR) environment, which is based on multi-frame joint and vector quantization technology. But their performance and stability deteriorate seriously in low SNR environment and the problems of large storage capacity requirements, prohibitively high complexity and long delay still exist. From the above consideration, the factors of subjective and objective speech perception are included in design of low bit rate speech coder in this thesis. Firstly, an research on robust speech detection algorithms in low SNR environments is given, combining with knowledge of human speech perception differences in frequency. Secondly, research is mainly focused on the design of low complexity codebook which can better express speech source space, from both aspects of subjective perception of speech and information structural objective sensing of speech, which are caused by human and encoder respectively. All of the above are aiming at improving the performance and robustness of the encoder from the perspectives of detection and compression. The main contributions are as follows:1、In order to improve voice activity detection(VAD) accuracy in low signal noise ratio(SNR) environments, an robust VAD algorithm based on sub-band adaptive reserved likelihood ratios(LRs) with sub-band double features is proposed, aiming at alleviating the false alarm problem in detection of non-speech signal of the statistical model likelihood ratios test(LRT)-based method. Reserved weight was employed in likelihood ratio decision rule and determined by speech double feature of sub-bands. Reserved threshold is estimated adaptively according to passed the VAD results and their sub-band feature parameters. The experiment conducted on various low SNR scenarios shows its promising performance in comparison with similar algorithms, the VAD correct rate is improved by 0.96%~15.91% 1.54%~17.96% and 0.65%~11.44% respectively in 10 d B, 0dB and-10 dB white noisy environment, 2%~18.27% 2.90%~11.86% and 0.18%~3.65% respectively in 10 dB, 0d B and-10 dB non-stationary babble stationary noisy environment. The method is also applied in 2.4kbps low bit rate coder and the perceptual evaluation of speech quality(PESQ) is improved by 0.159, 0.157 and 0.186 in babble noisy environment, 0.153, 0.098 and 0.096 in white noisy environment.2、An adaptive orthogonal M-split codebook generation method is proposed to improve the representation of the initial codebook in the source space. The method splits one code word into multiple new code words with adaptive split coefficient vectors and set the increment to be orthogonal, aiming at decreasing the iterations of the following clustering. Experiment shows that the proposed algorithm provides a reduction of 18%~45% in designing codebook in size of 64~2048 with almost equal VQ performance, compared with the universal codebook generation algorithm.3、An VQ codebook design method for LSF parameter based on human perception is proposed from the perspective of speech comprehension of human ear, aiming at solving the problem of imbalance of allocating coding source. The method divides the source space of LSF residual into multiple region with non standard elliptic equation, adjusts the coding source by adjusting training data proportion in different regions based on the fact that the transition speech section is better for comprehension of human ear. Experiment result shows that the PESQ is improve by 0.03 and 0.02 for male and female speech respectively. Further more, in order to reduce the codebook capacity and quantization delay, an improved adaptive multi-scale lattice vector quantization based on global nonuniform and local uniform(GNLU)is proposed from the idea of designing codebook in different region. Experimental results show that the improved method gives a reduction of 60%~100% in codebook capacity and 69%~80% in quantization delay. It offers an compromise option among delay, storage capacity requirement and VQ performance.4 、 An research on compression and reconstruction performance of speech signal parameter is given with compressed sensing(CS) theory from an aspect of objective speech perception from information structural or content of speech signal. Firstly, the sparsity of LSF parameters on the orthogonal basis is analyzed and the compression and reconstruction performance is research with overcomplete dictionary. Secondly, an LSF optimization algorithm in decoder based on the sparse representation is proposed, aiming at reducing the distortion with the priori knowledge of quantization error. Experiment result shows that the ASD of optimized LSF is reduced by 0.3~1.8%.5、Finally, the above research results of VAD and perception codebook design are integrated, then an 500 bps ultra low bit rate speech coding algorithm based on perception is proposed. Experiment result shows that the PESQ is improved by 0.201 ~ 0.141 in the environment without noise, compared to the algorithm proposed by the Chinese academy of sciences in 2013.
Keywords/Search Tags:Low bit rate speech coding, Perception, Robust voice activity detection, Vector quantization
PDF Full Text Request
Related items