Font Size: a A A

Low Bit Rate LPC-MBE Speech Coding Algorithm Based On Speech Enhancement And Dyadic Wavelet Transform Pitch Detection

Posted on:2005-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:J DongFull Text:PDF
GTID:2168360125450299Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
IntroductionThe technology of speech coding advances rapidly in recent years. At present, the system of very low bit rate speech coding theory and technology at bit rate above 2.4kb/s has been provided rather perfectly, and has been standardized by some international or regional organizations. Now the research of speech coding is centralized on the study and realization of very low bite rate under 2.4kb/s speech coding techniques and algorithms.Because of the limitation of the simple excitation models of Linear Predictive Coding (LPC) and lots of bits representation for the excitation vector of Code-Excited Linear Predictive (CELP) coding, these speech vocoders can not achieve good synthetic speech quality when the bit rate is lower. In current speech vocoder designs, on the basis of above-mentioned two kinds of algorithms, the performance of Multi-Band Excitation (MBE) model has been further improved, which overcomes the disadvantages of the LPC and CELP model.1. Multi-Band Excitation Speech Coding TechnologyMBE coding was originally proposed by D. W. Griffin at MIT in 1987 at 9.6kb/s and later improved and reached to 2.4kb/s. In MBE model, the basic methodology is to divide the speech signal into overlapping speech frames using analysis window. Working in the frequency domain, each short-time signal frame is divided into distinct frequency bands and analyzed according to the speech model. The vocoder makes the voiced/unvoiced decision for each frequency band and estimates the model parameters by explicit comparison between the original speech spectrum and the synthetic speech spectrum. During speech synthesis, the voiced portion of speech is synthesized in the time domain and the unvoiced portion of speech in the frequency domain using a periodic impulse sequence spectrum (for voiced) and random white noise sequence spectrum (for unvoiced) as excitation signals respectively. Then the voiced speech and unvoiced speech are summed together to form full-band synthetic speech. 2. Speech AnalysisBased on the MBE speech model, we improved this technology in some aspects in this paper. First, the input speech signal is analyzed to set up a feasible speech model and estimate the model parameters accurately. In this speech model, the parameters need to be transmitted to the channel consisting of pitch period, spectral envelope and voiced/unvoiced decision for each harmonic.Statistics shows that if speech signals have an even spectral structure, substitution of the signals spectral with LPC spectral is available. So after the preprocessing, the spectral envelope can be estimated through LPC analysis. Using conjugate gradient algorithm, the parameters and , representing spectral envelope information of every speech frame in each frequency band are rapidly and accurately obtained.Pitch period is an important parameter in the analysis and synthesis of speech signals. Classical pitch detectors estimate pitch period by a direct approach such as autocorrelation function method, average magnitude difference function method and cepstrum method etc. These nonevent based pitch detectors assume that the pitch period is stationary within each segment and each segment contains at least two full pitch periods. The disadvantages of these pitch detection methods are: a) insensitive to nonstationary variations in pitch period over the segment length and b) unsuitable for both low pitched and high pitched speakers. In this paper, we apply a time-scale representation known as dyadic wavelet transform (DyWT) to locate glottal closure (GC). We choose quadratic spline wavelet as the mother wavelet, which is the first derivative of the cubic spline, then compute the wavelet transform of input signal using a scale parameter, which is discretized along the dyadic sequence, and locate the modulus maxima positions on every scale to detect the instant of GC. The distance of two adjacent maxima is the pitch period. Experiments show that this DyWT pitch detection algorithm is simple, accurate and robust to noise. It is also potentia...
Keywords/Search Tags:Low Bit Rate Speech Coding, Multi-band Excitation, Speech Enhancement, Dyadic Wavelet Transform
PDF Full Text Request
Related items