Acoustic echo is a common phenomenon in applications such as hands-free communication or video conferencing.Due to the coupling between the microphone and the loudspeaker,the acoustic echo may impair speech intelligibility and listening comfort considerably.With the development of the artificial intelligent technology,smart devices with speech interaction,e.g.smart speaker,begin to penetrate into our daily lives.These devices usually have compact designs,so that the microphone is close to the loudspeaker,exacerbating the acoustic echo problem.Furthermore,the size of loudspeakers on these devices are usually small,which suffers more from the non-linear distortion at low frequencies,making it more difficult to suppress the echo.The automatic speech recognition performance of these devices deteriorates significantly because of the acoustic echo.This thesis investigates acoustic echo cancelation(AEC)algorithms for applications mentioned above with a special focus on improving the adaptive filter algorithm for AEC and techniques for residual echo suppression.A conventional signal model for AEC is analyzed in this thesis and commonly used time-domain and frequency-domain adaptive algorithms are reviewed.Moreover,the steady-state solution of the frequency-domain Kalman filter(FKF)are investigated.It is found that the steady-state equivalent weight vector in the time-domain of the FKF is biased from the optimal solution,namely the Wiener solution,when the adaptive filter is of deficient length.To resolve performance deterioration,an efficient improvement of the FKF is proposed.By rearranging the weight update of the FKF,it is guaranteed that the modified FKF achieves optimal steady-state behavior.The extra computational burden of the modified FKF is low,ensuring easier implementation in practical situations.Simulations with a measure room impulse response and speech signal are carried out to verify its advantage over the original FKF in under-modelling scenarios.When the size of the loudspeaker is small and the loudspeaker is close to the microphone,the relationship between the echo and the reference signal could be highly non-linear in the context of AEC.Under such circumstances,adaptive filters based on linear models cannot suppress the echo effectively and post-processing measures are required for residual echo suppression.This thesis discusses a residual echo suppression method based on the combination of a deep speech generative model and non-negative matrix factorization.Temporal convolution networks are implemented in the encoder and decoder networks of the deep speech generative model for the purpose of lower the computational burden.Objective evaluation results reveal that the proposed method outperforms the traditional signal-processing based methods.Finally,a brief summary of this paper is presented and the directions of future work are discussed. |