| The 21st century is a period of rapid development of intelligent information industry,the emergence of various intelligent audio and video conferences has greatly facilitated activities such as remote office or cross-regional communication.Echo cancellation,as an important link to ensure the quality of conference calls,has also attracted extensive attention.However,the traditional echo cancellation algorithm based on adaptive filtering has some problems,such as slow convergence speed,high computational complexity and low double talk detection accuracy,so its ability to deal with nonlinear acoustic echo is limited;the traditional single microphone equipment is also difficult to meet the actual needs because of its small acquisition range and poor spatial realism.In order to solve the above problems,this thesis constructs an efficient multi-channel echo cancellation method with the help of the powerful nonlinear modeling and self-learning ability of deep neural network.The main research contents of this thesis are as follows:(1)Firstly,aiming at the shortcomings of the traditional adaptive filtering echo cancellation algorithm,with the help of the advantages of RNN processing timing signals,this paper constructed an echo cancellation network based on LSTM to replace the traditional adaptive filtering algorithm,and expanded from single channel to multichannel echo cancellation.Finally,experiments show that the LSTM multi-channel echo cancellation method based on deep learning can eliminate echo more effectively.(2)A multichannel spectrum masking echo cancellation method based on ECACRN is proposed.In this method,the codec is constructed with the help of multi-layer convolutional neural networks to strengthen the extraction ability of the network for different levels of features,and overcomes the shortcomings of the full connection layer network in the LSTM structure,such as redundant parameters and difficult to extract some local invariant features.At the same time,channel attention ECA-NET is integrated into the codec to improve the attention of the network to important features.In addition,combined with switchable normalization,so that the network can adaptively select the appropriate normalization according to the input signals with different distributions.Finally,by comparing the results with other methods,it is verified that ECA-CRN method has better echo cancellation performance and can effectively improve the auditory quality of speech.(3)A multichannel spectrum masking echo cancellation method based on MP_UNet is proposed.The full convolution structure is adopted in this method,which overcomes the shortcomings of large amount of model parameters and slow calculation speed of LSTM in ECA-CRN.At the same time,the spectrum masking algorithm combining amplitude and phase is adopted to effectively avoid the phase distortion caused by using amplitude mask alone.In addition,the channel attention module with low model complexity is added to further improve the performance of the network.Finally,the experimental results under single lecture and double lecture show that the method based on MP_U-Net can achieve better echo cancellation effect than other methods based on deep learning. |