As one of the pre-processing methods of the speech signal front-end,speech enhancement technology is a hot spot in the field of digital speech signal research,which aims to recover clean signals close to the original speech signal from noisy speech signals.According to the number of microphones used for collecting signals,it can be categorized into two patterns: single-channel and multi-channel.In this article,the two-channel case in multi-channel is used to study speech enhancement,because it not only conforms to the characteristics of human binaural,but also can make full use of the spatial information of the speech signal.Since the machine learning methods arise,a large number of new speech enhancement algorithms have emerged one after another.Among them,the use of non-negative matrix factorization method for speech enhancement works well,and its non-negative data characteristics have more practical meaning.The main work of the thesis is to propose an unsupervised speech enhancement algorithm combining generalized cross-correlation with non-negative matrix,and improve its shortcomings according to actual needs.The detail is as follows:1)A simple study and analysis of several typical traditional speech enhancement algorithms,and introduces the principle and algorithm characteristics of the basic non-negative matrix factorization method.The multi-channel non-negative matrix enhancement algorithm and sound source localization method of microphone array are described.2)Aiming at the defect that the traditional single-channel speech enhancement algorithm does not use the spatial information of the speech signal,a generalized cross-correlation combined with non-negative matrix factorization for speech enhancement algorithm is proposed.This method performs dictionary pre-learning on the input mixed signal,then randomly initializes the activation coefficient vector and iteratively updates,so that the activation coefficient of the input mixed speech signal pre-learning dictionary can be derived frame by frame.Besides,the max-pooled generalized cross-correlation phase technology is used to locate the online target,which not only ensures the real-time performance of the algorithm,but also extremely improves the quality and intelligibility of the reconstructed speech.3)In view of the inherent delay of the speech enhancement algorithm based on the short-time Fourier transform algorithm,an asymmetric short-time Fourier transform windowing method is proposed to replace the traditional symmetric window with a shorter synthesis window to achieve low latency of the algorithm.Experiments show that the algorithm can reduce the inherent algorithm delay to 2ms without decreasing the quality and intelligibility of speech.Based on the characteristics above,the algorithm has certain practical value. |