In recent years,with the vigorous development of intelligent speech technology,various intelligent terminal products such as intelligent home appliances,audio and video systems,intelligent wearable devices,intelligent robots,etc.have received extensive attention and applications,and speech interaction has increasingly become the most convenient man-machine for intelligent terminals.Interactive interfaces and speech interaction scenarios have become more and more complicated,especially in indoor scenes such as daily life and conferences.Reverberation,noise interference,and mobile sound sources will reduce the target speech quality and cause a decline in the speech interaction experience.Speech enhancement is a signal processing technology that studies how to extract a purer speech signal from the interfered signal.Compared with a single microphone,the microphone array has more robust performance and anti-interference ability in a complicated environment with high reverberation and low signal-to-noise ratio,and has attracted much attention in the research of speech enhancement.Sound source localization and beamforming are the two research hotspots of microphone array technology.In this paper,in the complicated scene of indoor human-computer interaction,the signal quality of the target sound source is greatly reduced due to the interference of reverberation and noise.The microphone array sound source localization and deep learning beamforming speech enhancement research are carried out,and the performance of the method proposed in this paper is verified through experiments.The main work and innovations of this paper are as follows:1.Aiming at the problem of mobile sound source localization under high reverberation and low signal-to-noise ratio,a localization algorithm based on compressed sensing that uses the correlation between the frequency points of speech to perform joint sparse estimation in the frequency domain is proposed,and the performance of the algorithm is verified through experiments.2.Introduce deep learning into the beamformer design,combine the advantages of traditional filter-and-sum algorithms and deep learning data-driven,through the first stage is initialized with filter-and-sum coefficient as the target,and the second stage is optimized with the target speech,and based on the data learning to establish a higherorder nonlinear mapping relationship,so as to achieve data-driven optimized beamforming.Simulations and experiments verify the effectiveness of this method in improving speech quality in complicated environments. |