Speech Separation Based On Deep Learning

Posted on:2018-05-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Zhang

Full Text:PDF

GTID:1318330542980077

Subject:Computer application technology

Abstract/Summary:

Speech separation separates target speech from background noises.It can remove the noises and improve the speech quality and intelligibility.Speech separation has a wide range of applications,including hearing prosthesis,mobile telecommunication,and robust automatic speech and speaker recognition.Deep learning based speech separation formalizes the problem as a machine learning task.It trains a learning machine to cast the noisy speech to the target clean speech.This method has achieved considerable performance improvements over conventional approaches,and have been a promising research area.This thesis works on the deep learning based speech separation,researches some concrete problems in deep and proposes some new methods.The main contributions of our research are described as follows:1.We proposed to combine the mapping-based and masking-based training targets together by the ensemble learning framework.This work used the complementary of these two types of targets,built a multi-targets deep neural network(DNN)for speech separation,a multilayer perceptron(MLP)for estimation merging.Then,the merging MLP and the separation DNN are connected together and trained jointly.The proposed joint model improves the separation performance.2.We proposed to use the convolutional neural network(CNN)in the pitch estimation task.This work analyzed speech harmonic structure and shown its shift-invariance.CNN is used to model its shift-invariance.This work improved the performance of the pitch estimation task.3.We proposed to combine the speech separation and pitch estimation tasks together because they can boost each other.These two tasks are embedded into a deep stacking network(DSN).On the one hand,the pitch-based feature from the pitch estimation contributes to the speech separation,and makes a performance improvement.On the other hand,the speech separation removed the noises,make the pitch estimation easier,and improved its accuracy.These two steps run iteratively,both of the performance in these two tasks are improved.4.We proposed to combine the monaural speech separation and multi-channel microphone array beamforming method together.The speech separation removes the noise from the multi-channel signal,makes the steering vector estimation more accurate,and improves the beamforming.The outputs from beamforming carries the cross-channel information which is useful for the monaural speech separation.Therefore,these two tasks can boost each other.We embedded them into a DSN,their performances are improved together.This thesis started by introducing the speech separation methods,and then analyze their strengths and weaknesses.We describe the processing,architecture and research methods in the deep learning based speech separation with details.New methods are proposed,and an experimental system is built.Experimental results show the proposed methods improve the separation performance.

Keywords/Search Tags:

Speech separation, Deep learning, Pitch estimation, Deep neural network(DNN), Microphone array, Speech enchantment

Related items

1	Speech Separation Based On Microphone Array And Deep Learning
2	Deep Learning-based Speech Enhancement With Microphone Array
3	Study On Speech Enhancement With Reverberation
4	Research And Implementation Of Speech Enhancement Based On Microphone Array And Deep Learning
5	Research On Single-channel Speech Separation Technology Based On Deep Learning
6	Single Channel Speech Separation Methods Based On Deep Neural Network
7	Speech Recognition Front-End Processing Based On Deep Neural Network
8	Research On Deep Learning-based Identification Of Multi-speech Sources Using A Small-scale Microphone Array
9	Binaural Speech Separation Research Based On Deep Learning
10	Research And Design Of Speech Separation Algorithm Based On Deep Learning