Font Size: a A A

Research On Critical Algorithms Of Multiband Excitation Vocoder

Posted on:2014-11-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Q ZhouFull Text:PDF
GTID:1268330398985723Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The multiband excitation (MBE) vocoder has the potential to produce high quality synthesis speech in theory, being one kind of the most popular speech models researched for applications operating at low bit rates. Unfortunately, deviation is very likely to be introduced in procedures of parameter estimation, parameter quantification, wireless transmission and speech synthesis, which may result in speech quality degradation especially when at low bit rates and for speech disturbed by background noise and channel noise simultaneously. The main purpose of this paper is to improve the critical algorithms involved in procedures of parameter estimation and speech synthesis.The thresholds of the constraint equations in the traditional two-path pitch tracking algorithm are all set to fixed empirical values, resulting in not robust for pitch estimation. Firstly a new difference inequation is constructed as the only constraint equation, which is convenient in form for the establishment of an adaptive threshold adjusting model. Then based on the research of the characteristic of multiple/sub-multiple pitch interference for single frame by analyzing the relation of the maximum difference between fitting error in single frame of standard pitch and that of multiple/sub-multiple pitch, it is found that, ability of suppressing multiple pitch interference may be decreased by using an over low threshold, and. more sub-multiple pitch errors may be introduced by using an over high threshold. Consequently a two-threshold adaptive two-path pitch tracking algorithm and a whole-range adaptive two-path pitch tracking algorithm are proposed, with difference threshold updated according to the pitch estimate length feature of the previous frames and the multiple pitch error identification of the previous frame respectively. In experiment, the gross error rate (GER) of these two proposed algorithms degrades generally comparing with the traditional algorithm, particularly for female speech. Average GER improvements for female speech are82.13%and82.19%at signal-to-noise ratio (SNR) level of-5dB by using these two proposed algorithms, respectively. The mean error (ME) generally increases, however the increasement of the ME is very small. As a whole, the ME losses are negligible compared with the GER improvements of these two proposed algorithms. Experimental results indicate that, these two proposed algorithms can perform quite well even for different speakers and in different white noise conditions, and especially the accuracy of the pitch estimation is improved more significantly in serious noisy background.The sub-band division introduces additional misjudgment unavoidably. In existing researches, the sub-band is divided empirically, which may lead to terrible distortion influence. Therefore, considering the compact relation between the distortion influence and the sub-band energy, a new sub-band division method based on the characteristic of spectral distribution is proposed, in which both the harmonic construction and the harmonic energy distribution of the current frame are taken into consideration and the rule is based on the balance of both bandwidth and energy. Besides, since the voiced degree of sub-band is always reduced by background noise, in order to enhance the robustness of sub-band V/UV dicision, a new sub-band V/UV decision method is proposed based on correlation enhancement. The V/UV clusters of pure speech and noisy speech at different SNR levels are mapped by using the Fisher optimum projection vector and the new V/UV clusters with significant difference between the new V cluster and the new UV cluster are obtained, thus the relation between the eigenvalue and the V/UV state is strengthened. Simultaneously, the result of sub-band V/UV decision of the previous frame and the correlation coefficient matrix of the V/UV states over time are introduced in the maximum a posteriori criterion, thus the relation between the V/UV states of adjacent frames is introduced. The experiment result shows that, the optimization algorithm expresses strong anti-noise ability in various background conditions, and speech quality is improved.For harmonic spectrum amplitude estimation, linear predictive (LP) autoregressive model is mainly used to approximate MBE spectrum amplitude vectors in low rate MBE vocoder, and LP parameters are converted to linear spectral frequency (LSF) parameters as a consequence. However, the autocorrelation sequence will be inaccurate when MBE spectrum density sequence is sparse or there is an offset in frequency domain sampling, and this may lead to large approximate error between LP spectrum and MBE spectrum, thus a correction method of MBE spectrum density interpolation is proposed. The proposed method expands MBE spectrum density sequence to a new sequence which is uniformly distributed in frequency domain with frequency sampled adequately, and corrects the LP gain to avoid the different fluctuation of the total power among different speech frames which is caused by MBE spectrum density interpolation. In addition, LSF coefficient extraction algorithm cannot ensure that LSF coefficient is in an ascending sort, thus an optimization method of LSF coefficient is proposed. The proposed method finely tunes the LSF coefficients in two directions and takes the average values from these two directions as the final LSF coefficients. According to the experiment result, it is shown that, the improved algorithm can effectively decrease the LP spectrum envelope estimation error and avoid the partial abnormal peak occurring in synthesis speech, and speech quality is improved.For speech synthesis, the method of regenerating is used to regenerate the phases of the voiced harmonics in MBE vocoder at low bit rates. The existing phase generating methods may cause the speech wave unbalanced, and the peak-to-average power ratio (PAPR) of the synthesis speech signal is high, which means the increasing of the probability of saturation distortion or the decreasing of the standby time of actual system. Therefore, an optimal initial phase design method based on computer ergodic searching is proposed. Each harmonic component is allocated a random initial phase in the assumption that all harmonics are voiced, and then the set of phases corresponding to the minimum speech signal peak are considered as the optimal initial phases. The optimal initial phases can be used to suppress the speech wave imbalance nearly without additional complexity for actual system, having strong practicability and significant application value.
Keywords/Search Tags:speech compression, multiband excitation vocoder, pitch tracking, sub-bandvoiced/unvoiced decision, linear prediction analysis, speech synthesis
PDF Full Text Request
Related items