Font Size: a A A

Speech Separation Based On Microphone Array And Deep Learning

Posted on:2021-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ChenFull Text:PDF
GTID:2518306476950249Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the front end of the speech signal processing system,the speech separation technology directly affects the performance of the speech signal processing system.The performance of traditional speech separation algorithms is severely degraded in high reverberation and low SNR environments.Based on Computational auditory scene analysis(CASA),this disseratation combines the spatial information of the microphone array and deep neural network to propose two speech separation algorithms: DNN microphone array speech separation algorithm based on improved Stepped Response Power Phase Transform(SRP-PHAT)and Temporal Convolution Residual Neural Network(TC-ResNet)microphone array speech separation algorithm.(1)DNN microphone array speech separation algorithm based on improved SRP-PHAT.The algorithm uses the spatial features of the microphone array to achieve speech separation.The existing separation algorithms are mainly concentrated in binaural speech separation,which can only separate speech from forward dirction.The proposed algorithm combines the spatial features SRP-PHAT of the microphone array and the Gammatone human hearing filter bank to propose an improved SRP-PHAT feature for multi-speaker speech separation.The spatial featrues are trained by DNN model,and the training targets are 36 azimuths.The testing environments include noise and reverberations.The simulation results show that the algorithm achieves omnidirectional speech separation,and can still perform well in low SNR and high reverberation environments.(2)Microphone array speech separation algorithm based on TC-ResNet.The convolution neural network(CNN)is ultized based on the time sequence of speech signal.Temporal Convolution and Residual Block are also added to the network.Temporal Convolution not only enlarges the perception domain of the lower convolution layer,but also significantly reduces the amount of network computation.Residual blocks can be used to combine different resolution features.The simulation results show that the microphone array speech separation algorithm based on TC-ResNet have better performance in low SNR and high reverberation environments.
Keywords/Search Tags:Deep Neural network, Speech Separation, Computational Auditory Scene Analysis, spatial featrures, convolution neural network
PDF Full Text Request
Related items