Monaural Multi-speaker Speech Separation And Recognition

Posted on:2020-10-21

Degree:Master

Type:Thesis

Country:China

Candidate:X K Chang

Full Text:PDF

GTID:2428330620459985

Subject:Computer Science and Technology

Abstract/Summary:

The cocktail party problem,i.e.,tracing and distinguishing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the most critical problems in speech processing.Despite all the progress has been made in automatic speech recognition(ASR),significant performance degradation is still observed when recognizing multi-talker mixed speech.Because of the recent progress achieved by deep learning,researchers proposed many deep-learning based methods in the multi-speaker speech separation and recognition tasks.In this work,we exploited using permutation invariant training(PIT)in the monaural multi-speaker speech separation and recognition.We proposed three main innovative approaches.Firstly,we used the ASR criterion as our final goal.We proposed to train the monaural multi-speaker speech separation/recognition model by using speech feature separation and speech recognition as criterions.And we also applied joint learning to combine the speech feature separation and recognition.Furthermore,we introduced the gated convolutional network and attention mechanism in this task to improve the speech recognition performance.Secondly,to address the mismatch between the training and evaluation data,we proposed a speaker adaptive training technique using auxiliary features in monaural multi-speaker speech recognition task.We also did multi-task learning by using the auxiliary feature as a second task.Thirdly,we used the end-to-end models,which are popular in automatic speech recognition recently,in our multi-speaker task and presented a state-of-the-art monaural multi-speaker end-to-end automatic speech recognition model.All the methods proposed in this work were evaluated on two artificially synthesized corpus,i.e.AMI-mix and WSJ-mix.The results show that the PIT based monaural multi-speaker speech recognition model can achieve a significant reduction in terms of word error rate(WER),compared with normal speech recognition systems.

Keywords/Search Tags:

Neural Network, Permutation Invariant Traning, Cocktail-Part Problem, Speaker Adaptive Traning, Speech Recognition

Related items

1	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
2	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
3	Research And Application Of Chinese Text-to-speech Based On Recurrent Neural Network
4	Study On The Method Of Speaker-Dependent Isolated Word Speech Recognition
5	Alternative regularized neural network architectures for speech and speaker recognition
6	Research On Adaptive Recognition Of Different Accent Conversations Based On Convolutional Neural Network
7	Research On The Method Of Speaker-specific Speech Signal Recognition
8	Short Speech Speaker Recognition Method Based On Deep Learning And Its Application In Speech Separation
9	Research On Speaker Recognition Method Based On Deep Learning
10	Research On Speaker Adaptation In Speech Recognition