Font Size: a A A

Speech Separation Based On Deep Learning

Posted on:2018-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:1318330542980077Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech separation separates target speech from background noises.It can remove the noises and improve the speech quality and intelligibility.Speech separation has a wide range of applications,including hearing prosthesis,mobile telecommunication,and robust automatic speech and speaker recognition.Deep learning based speech separation formalizes the problem as a machine learning task.It trains a learning machine to cast the noisy speech to the target clean speech.This method has achieved considerable performance improvements over conventional approaches,and have been a promising research area.This thesis works on the deep learning based speech separation,researches some concrete problems in deep and proposes some new methods.The main contributions of our research are described as follows:1.We proposed to combine the mapping-based and masking-based training targets together by the ensemble learning framework.This work used the complementary of these two types of targets,built a multi-targets deep neural network(DNN)for speech separation,a multilayer perceptron(MLP)for estimation merging.Then,the merging MLP and the separation DNN are connected together and trained jointly.The proposed joint model improves the separation performance.2.We proposed to use the convolutional neural network(CNN)in the pitch estimation task.This work analyzed speech harmonic structure and shown its shift-invariance.CNN is used to model its shift-invariance.This work improved the performance of the pitch estimation task.3.We proposed to combine the speech separation and pitch estimation tasks together because they can boost each other.These two tasks are embedded into a deep stacking network(DSN).On the one hand,the pitch-based feature from the pitch estimation contributes to the speech separation,and makes a performance improvement.On the other hand,the speech separation removed the noises,make the pitch estimation easier,and improved its accuracy.These two steps run iteratively,both of the performance in these two tasks are improved.4.We proposed to combine the monaural speech separation and multi-channel microphone array beamforming method together.The speech separation removes the noise from the multi-channel signal,makes the steering vector estimation more accurate,and improves the beamforming.The outputs from beamforming carries the cross-channel information which is useful for the monaural speech separation.Therefore,these two tasks can boost each other.We embedded them into a DSN,their performances are improved together.This thesis started by introducing the speech separation methods,and then analyze their strengths and weaknesses.We describe the processing,architecture and research methods in the deep learning based speech separation with details.New methods are proposed,and an experimental system is built.Experimental results show the proposed methods improve the separation performance.
Keywords/Search Tags:Speech separation, Deep learning, Pitch estimation, Deep neural network(DNN), Microphone array, Speech enchantment
PDF Full Text Request
Related items