Font Size: a A A

Research On Very Low Bit Rate Speech Coding Algorithm

Posted on:2015-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhangFull Text:PDF
GTID:2308330464466670Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of digital communication technology, frequency resource becomes more and more valuable. Reducing the rate of speech coding is of great guiding significance and practical value because of improving the channel utilization and saving costs. So the development of low bit rate speech coding will be one of the important development direction of speech coding technology in future. Mixed excitation linear prediction(MELP)algorithm is one of the most potential of low bit rate speech coding algorithm. MELP is based on linear predictive mode, that combined with Muti-Band ideas and five key technology is increased so that it improves the quality of synthetic speech significantly.The thesis chooses MELP algorithm, the Federal Standard of USA, as a basic research object. By understanding the principle and the realization of the algorithm, we make a systematic analysis and research on the algorithm, and a very low bit rate of 0.6 kb/s speech coding algorithm is designed. At the same time, we make a detailed analysis of various parameters to analyze the influence on the quality of synthetic speech and we make a promotion from the pitch and voicing state in order to improve the quality of the synthetic speech.The thesis designs a 0.6 kb/s very low bit rate speech coding algorithm by using a so called super-frame. In this paper we are on the basis of MELP model, increasing processing frame length to 200 samples and jointing four frames as a super-frame, to transmit the parameters of speech in units of super-frame. The super-frame is quantized a total of 60 bits. The states of four frames are a total of 16 combinations and according to their probability the combinations are divided into four patterns or two classification, unvoiced and voiced as a whole. The bits of parameters are allocated optimally according to the classification. Unvoiced speech’s pitch is not transmitted. 10 LSF parameters in each frame are quantized 11 bits using one-stage vector quantizer. For Voiced speech, the pitch of last three child-frame in super-frame is transmitted and quantized uniformly, but the first pitch is not transmitted. For LSF parameters, LSF of only the second child-frame and the fourth child-frame is transmitted and quantized, every child-frame’s 10 LSF are quantized into 11 bits using one-stage vector quantizer.The gains are to make synthetic speech match with the original signal amplitude. Two gains are extracted from the second child-frame and the fourth child-frame respectively, and combined the four gains into a four dimensional vector and the vector is quantized into 10 bits by vector quantizer. Other bits are used to transmit subband judgment that the first subband judgment must be transmitted. At the decoding, first of all, the mode of super-frame is deciding according to the first subband judgment in each child-frame, then four groups of parameters are restored through interpolation in the decoding. The bit rate of standard MELP coding algorithm is decreased to 0.6 kb/s by use of vector quantizer in super-frame.Test results show that the quality of 0.6 kb/s encoder synthetic speech is 2.18 that is lower 0.534 than standard algorithm in PESQ, but the intelligibility of synthetic speech is high.In order to improve the synthetic speech quality, algorithm is improved from pitch and voicing judgment. First, we will increase an voicing judgment, namely transition in encoder. If unvoiced is treated as voiced, there will be little effect on synthetic speech quality, instead if voiced is treated as unvoiced, the quality will become bad, so linear prediction coefficients in transition is extracted by use of adaptive window that contains more voiced information in the linear prediction coefficients. Second, the input signals are upsampled when pitch is extracted, because digital signals are more close to the original analog signals through upsampling can improve the accuracy of pitch extracted so that it can improve the quality of synthesized speech.The results show that the standard algorithm’s PESQ values are promoted 0.061, through improving pitch and transition’s linear prediction analysis.
Keywords/Search Tags:Low Bit Rate, Speech Coding, MELP, Super-frame, Quality
PDF Full Text Request
Related items