Research And Implementation Of Multi-speaker Speech Separation Technology Based On Deep Learning

Posted on:2021-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:J F Wang

Full Text:PDF

GTID:2428330647457223

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Sound signals are one of the important ways for people to obtain information from the outside world.In the context of information globalization today,the amount of information interaction has become very large.It is no longer feasible to process and distinguish this information by human resources alone.People began to use computers to complete these jobs.In voice signal processing technology,in addition to speech recognition and voiceprint recognition,there is another branch,which is speech separation technology.Speech separation technology can be divided into many categories according to different sound input and application scenarios.In this paper,we study the multi-speaker speech separation technology based on deep learning.There are many ways to implement speech separation technology.In this paper,speech separation is considered as a binary classification problem.The article considers the target speaker's voice as a positive example and other uninteresting voices as negative examples.By extracting the positive speech of interest from the mixed speech,we can achieve the separation effect,and then change different target speakers,we can achieve the speech separation of multiple speakers.It can be seen from the method that the premise of separating voice from multi speaker voice is to have the speech information of separated target as a positive example reference,and then to separate the corresponding speaker voice from multi speaker through the model.This paper is based on the above thinking to design a multi speaker speech separation method.The technical points of this paper are divided into two parts,which are single speaker speech segment acquisition and multi speaker speech separation model training.Firstly,BIC,hierarchical clustering,K-means and Gap-Statistic technology were used to segment and cluster the input mixed speech to obtain the voiceprint information of each speaker.Then,using deep learning methods,LSTM and CNNs networks were used to design and train a multi-speaker speech separation model.Finally,in an end-to-end manner,we input the mixed voice into the model.With the help of the acquired voiceprint information,the model outputs the voice information of each speaker separated from the mixed voice.

Keywords/Search Tags:

speech segmentation and clustering, multi-speaker, speech separation, deep learning, end-to-end

PDF Full Text Request

Related items

1	Multi-speaker Speech Separation Based On Deep Learning
2	Short Speech Speaker Recognition Method Based On Deep Learning And Its Application In Speech Separation
3	Research On Monaural Speech Separation Of Specific Speaker Based On Deep Learning
4	Research And Implementation Of Multi-speaker Speech Separation Based On Deep Learning
5	Research On Speech Preprocessing Of Speech Recognition For Multi-talker Conversations In Complex Acoustic Environments
6	Research On Auto-regressive Deep Neural Networks' Based Monaural Speech Separation
7	Research On Multi-person Speech Recognition Based On Deep Learning
8	Speaker-Independent Single-Channel Speech Separation Based On Deep Learning
9	Analysis Of Speaker Roles For Multi-speaker Conversational Speech
10	Research On Speech Separation Algorithm Based On Fuzzy Clustering And Deep Learning