Font Size: a A A

Research On Algorithms Of Single Channel Speech Watermarking And Speech Enhancement

Posted on:2018-05-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:1318330542470554Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The main efforts of this thesis are devoted to researches on speech watermarking and speech enhancement. On one hand, speech watermarking is a branch of information hiding, which is playing a more and more important role in information security areas. It can guarantee the integrity of information in transmission. Speech watermarking can be applied to areas such as copyright protection, identification,authentication, digital forensics, covert communication, bandwidth extension and legacy system enhancement. On the other hand, speech enhancement is a branch of information retrieving. In practice, not only signal itself but also spectrum and signal features can be enhanced. Because interference and noise ubiquitously exist in various conditions,it is necessary to employ enhancement for further and better information processing. By using speech enhancement technologies, listening experience can be improved for voice communication in noisy environments, recognition rate could be increased for speech recognition when the ambient noise is dominant, and bettercommunication can be obtained for impaired listeners. For these reasons, deep studies of these two topics have important academic significances and long-term economic and social values.Both speech watermarking and speech enhancement have attracted many research interests, but there still remain a lot of difficulties in these two areas. On one hand, there are two main issues in speech watermarking. The first one is robustness.For example, it is hard to resist against channel attacks in PSTN. The second one concerns high embedding data rate in narrowband speech. On the other hand,although speech enhancement is a basic problem in speech signal processing, and many methods have been proposed too, there still exist several troubles. For instance,it is not easy to fully exploit characteristics of speech itself, to effectively remove nonstationary noises, and to eliminate artificial noises. To tackle these problems,algorithms of speech watermarking and speech enhancement are deeply studied in this work.The author's major effort consists of two parts. The first part is concerned about speech watermarking. Firstly, for PSTN channel, a robust watermarking scheme is proposed based on spread spectrum and perceptual filtering, which can resist attacks such as bandpass filtering, requantization and companding. Secondly, a watermarking scheme with high data capacity is proposed on the basis of subband replacement and spectral envelope constraint, which employs energy distribution features of speech spectrum in low-frequency and high-frequency range, and the feature of the insensitivity to high-frequency of human perceptual system. The second part is dedicated to speech enhancement. Firstly, a lower bound of autoregression parameters estimation is derived, and an iterative Wiener filtering method is proposed to estimate the spectral of clean speech. Secondly, speech enhancement is achieved by combining autoregression modeling and line spectrum frequencies tracking, in which temporal dependence between speech frames is assumed. Spectral estimation is improved by using Kalman filtering, which facilitates the adaptations of this method for both stationary and nonstationary environments, as well as the reduction of musical tones.Lastly, a lower bound for parameters estimation in a real harmonic model is derived,and spectral estimation is improved by using pitch estimation and comb filtering.More details and main contributions of this research are described as follows:1. After analyzing the PSTN voiceband channel attacks, an algorithm of speech watermarking over the PSTN voiceband channel based on spread spectrum and perceptual filtering is proposed. This speech watermarking scheme improve the methods for generating, embedding and extracting a watermark. To combat the bandpass filtering attack, a Manchester Non-Return-to-Zero code is used as the pulse shape of the spread spectrum code when generating a watermark, modified designs of the psychoacoustic model and the perceptual filter are suggested using a subband technique when embedding and extracting a watermark. To combat the line card attack, a preprocessing algorithm for the watermarked signal is proposed. Also, the theoretical embedding capacity of the proposed algorithm is derived. Experimental results show that the watermarking system is robust against bandpass filtering attack,requantization attack and companding attack, and has a high capacity and good listening test results. Under joint attacks, the bit error rate is below 0.005 at the rate of 25 bps, and the score of perceptual evaluation of speech quality is more than 4.2. A narrowband speech watermarking algorithm is proposed by employing the feature that human audible system was insensitive to frequency components above the third formant frequency in a speech. To determine the frequency range for subband replacement, the evaluation method of Gaussianity and parametric estimation method of probability density function for the third formant frequency are presented. To keep the imperceptibility of the watermark signal, a power threshold method is used to scale and spectrally constrain this watermark. To adapt to the time-varying channel and reduce the bit error rate, a training sequence is inserted into the hidden message sequence when embedding the watermark, and an equalization method is introduced when extracting the watermark. In addition, the performance of the proposed watermarking system is theoretically analyzed in terms of embedding capacity and bit error rate. Experimental results show that the efficient frequency range for subband replacement can be obtained by analyzing the statistics of the third formant frequency of the speech, furthermore, the proposed watermarking algorithm has high capacity of 1.2 kbps, and it is also robust to various attacks and had good results of listening tests.3. The problem of speech enhancement based on autoregression model is proposed,which can be converted to that of parameter estimation and optimal filtering for noisy autoregression process. To evaluate the performance for parameters estimation, an asymptotical Cramer-Rao lower bound is derived in the frequency domain. To enhance speech spectrum, an iterative parameter estimation method based on maximum likelihood (ML) criterion and an iterative Wiener filter based on maximum a posteriori (MAP) criterion are proposed. Experimental results show that the proposed method can accurately estimated parameters, converge fast and asymptotically reach the Cramer-Rao lower bound, and is suitable for both low-order and high-order parametric spectral estimation for autoregression processes. In addition, when the input signal-to-noise ratio (SNR) is between 0 dB and 5 dB, the enhanced signals obtain a gain of 3 dB at most.4. An autoregression model based speech enhancement method using the linear spectrum frequencies is proposed. Clean speech is reconstructed by using model parameters estimated from noisy speech. Specifically,spectral envelope is estimated by tracking temporal trajectories in order to improve the distorted short time spectral amplitude. Noisy speech is preprocessed for more accurately estimating the spectral gain using linear prediction analysis. Spectral envelope is enhanced by tracking the line spectrum frequencies with Kalman filtering. Parameters estimation of Kalman filter is achieved by codebook mapping and maximum likelihood estimation. A complete system design and experiment validations are given in details. Performance evaluations based on a study of spectrogram, object measures and a subject listening test shows that the proposed approach achieves significant improvement overconventional methods in diversified conditions. For example, PESQ scores are increased by 0.3-0.7. A major feature of the proposed method is that it significantly reduce musical tones in the enhanced speech.5. The problem of speech enhancement based on real harmonic multiple sinusoids model is proposed,which can be converted to parameters estimation for noisy real harmonic multiple sinusoids model. To evaluate the performance for parameters estimation, an asymptotical Cramer-Rao lower bound is derived. To moreaccurately estimate parameters, a pre-processing algorithm based on period estimation and comb filtering is proposed. Experimental results show that parameters estimation with pre-processing is more accurate than that without pre-processing, and attain the asymptotical Cramer-Rao lower bound. Besides, the enhanced signals obtain a gain of 6 dB at most. Furthermore, both high-order and low-order real harmonic multiple sinusoids, speech spectrum can be enhanced when pre-processing was employed.
Keywords/Search Tags:Information hiding, Speech watermark, noise reduction, speech enhancement, spread spectrum, perceptual filtering, subband replacement, autoregression model, real harmonic multiple sinusoids model, Kalman filter, linear spectrum frequencies
PDF Full Text Request
Related items