Font Size: a A A

Research On Key Technologies Of Waveform Interpolation Speech Coding

Posted on:2008-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:F Y QiFull Text:PDF
GTID:1118360215494820Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In modern communication systems, speech is the most important and fundamental communication way, and it is commonly transmitted in terms of bit stream by compressing. Due to various factors, such as cost, efficiency, physical channel capacity and storage space, people hope to reduce the bit-rate required for the transmission of speech signal with good speech quality. The bit-rate reduction processing is known as speech coding.Speech coding at low bit rates has been applied to wireless mobile communications, VoIP, voice-mail, secure communications, and satellite communications. With the development of the next generation of wireless and internet systems, more applications and services of speech coding are likely to be offered. In recent years, people have paid more attention to developing high quality speech coding at low rate. So, it is a very important research issue for achieving communication quality at low bit rates.The waveform interpolation (WI) technique is a potential scheme for high quality speech coding at low bit rates and has extensively been researched in recent years. Based on existed research works, this dissertation focuses on key technologies in WI coding at low bit rates, and proposes many improved algorithms. Finally, a low-complex improved waveform interpolation (LIWI) speech coder is presented. The main research results are as follows:1. To reduce the computational complexity of WI coder, the improved fast algorithms for CW representation and CW alignment are proposed by applying Fast Fourier Transform (FFT), cubic B-spline interpolation and period extension technologies. The computational complexity has been reduced about 5 times over against the original one. Moreover, the interpolation and quantization processing of CW is more reasonable;2. A secondary power normalization algorithm is proposed in this dissertation. This normalization algorithm ensures that the energy sum of SEW and REW is 1. So, the energy ratio of SEW and REW can be achieved only by using SEW energy. This ratio is applied in REW quantization and CW composition;3. For more efficient quantization for Slowly Evolving Waveform (SEW) magnitude, Rapidly Evolving Waveform (REW) magnitude and power parameters, firstly, by applying the Equivalent Rectangular Bandwidth (ERB) theory, classifiable multi-codebooks method, analysis-by-synthesis (AbS) approach and so on, a predictive AbS multi-codebooks SEW magnitude quantization scheme is proposed. In the scheme, pitch information is exploited to determine which codebook is searched; secondly, for REW magnitude quantization, this dissertation proposed a DCT-matrix multi-codebooks quantization scheme. The classification in muti-codebooks is based on pitch and quantized SEW power. The multi-codebooks structure may offer more the information in quantization and solve the problem of the bit requirement limits in quantization by consuming some extra storage space; Furthermore, for the switch quantization of CW gain, a new classified parameter is proposed. The parameter represents the smoothness of speech energy evolvement, and this proposed method efficiently enhances the precision of gain quantization for voice onsets and transitional segments. These proposed schemes can greatly enhance the quality of the reconstructed speech;4. At the decoder, speech is classified based on the energy ratio of SEW and REW, a CW dynamic weighted composition method is proposed. The weighted parameter of SEW is direct proportional to the energy ratio. The weighted parameter of REW is in inverse proportion to the energy ratio. The method is beneficial to the description of unvoiced speech in WI coder and enhances the quality of the reconstructed speech;5. Based on Sigmoid function, an improved interpolation algorithm of the pitch is proposed, and the bug of original approach for the interpolation of some special pitches is modified;6. This dissertation presents a new method for voiced/unvoiced/silence of speech classification using Support Vector Machine (SVM). This method can effectively classify speech frames into voiced frame, unvoiced frame and silence frame under various levels of signal noise ratio. Based on the method, a robust voice activity detection algorithm in various noise environments is proposed.7. Finally, a low-complexity high quality WI speech coding algorithm at 2kb/s is developed. Performance tests have been conducted, including speech quality, algorithm complexity and storage space. Diagnostic Rhyme Test (DRT) results indicate that Chinese articulation of reconstructed speech is excellent; Mean Opinion Score (MOS) and subjective A/B test results indicate that the performance of proposed coder exceeds that of MELP (Mixed Excitation Linear Prediction) coder at 2.4kb/s greatly, and it is very close to that of FS1016—CELP(Code Excited Linear Prediction) coder at 4.8kb/s; The computational complexity of LIWI coder is about 91.254MOPS. The storage space for the code books required in this codec is about 78K float storage units.
Keywords/Search Tags:speech coding, waveform interpolation, characteristic waveform, vector quantization
PDF Full Text Request
Related items