Font Size: a A A

Binaural Speech Separation Research Based On Deep Learning

Posted on:2021-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LinFull Text:PDF
GTID:2518306476450264Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech separation is widely used in speech signal processing systems and artificial intelligence systems.In the real environment,the traditional speech separation algorithm has the problems of poor generalization performance under low signal-to-noise ratio and high reverberation.In this thesis,the binaural speech separation method based on deep neural network is studied with the characteristics of human ear hearing perception,based on spatial and spatial features and spectral features.The thesis mainly proposes the following two algorithms: Convolutional Neural Networks binaural speech separation method based on front and back frame information,and Deep Cluster(Deep Cluster)speech separation algorithm based on spectral map and spatial features.(1)CNN binaural speech separation method based on front and back frame features.The algorithm proposed is based on the time-frequency analysis characteristics of the human ear simulated by the Gammatone filter bank,the original speech signal is processed to obtain a time-frequency unit,and the binaural spatial feature parameters are extracted from the timefrequency unit,including the cross-correlation function CCF(Cross Correlatin Function),Internaural time difference ITD(Internaural Time Difference)and interaural intensity difference ILD(Internaural Level Difference).Past speech separation algorithms only used the information of the current frame for speech separation,and this paper uses features that speech has continuity in time sequence.After the spatial features are extracted,the spatial clues of the two frames before and after the current frame are stitched.The spatial feature map between ears is obtained as the input of the convolutional neural network.The paper uses SAR(Sources to Artifacts Ratio),SIR(Source to Interferences Ratio),SDR(Source to Distortion Ratio)and PESQ(Perceptual Evaluation of Speech Quality)as evaluation indicators of speech separation.The simulation results show that this algorithm is significant at low signal-to-noise ratio Better than DNN(Deep Nerual Networks)based on IBM(Ideal Binary Mask).(2)Deep clustering speech separation algorithm based on spectrogram and spatial features.Because speech has correlation in time series,the use of recurrent neural network RNN(Recurrent Neural Networks)can better model speech signals.In this paper,bi-directional long short-term memory unit Bi LSTM(Bi-directional Long Short-Term Memory)is used as the encoder to extract the logarithmic amplitude spectrum of the speech signal and the inter-ear phase difference(IPD)as the input feature vector.Cells are mapped to high-dimensional space.Finally,in the test,the high-dimensional space vectors are used to classify the time-frequency units through K-Means clustering,and combined with mixed speech to reconstruct the target signal.Experimental results show that the speech separation algorithm based on deep clustering makes full use of spectral information and spatial information.Compared with CNN-based networks,it has a significant improvement in SAR,SIR and SDR,and has good separation performance...
Keywords/Search Tags:Binaural Speech Separation, Convolutional Neural Network, Deep Cluster, Computational Auditory Scene Analysis
PDF Full Text Request
Related items