Font Size: a A A

Research And Implementation Of Multi-speaker Speech Separation Technology Based On Deep Learning

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:J F WangFull Text:PDF
GTID:2428330647457223Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sound signals are one of the important ways for people to obtain information from the outside world.In the context of information globalization today,the amount of information interaction has become very large.It is no longer feasible to process and distinguish this information by human resources alone.People began to use computers to complete these jobs.In voice signal processing technology,in addition to speech recognition and voiceprint recognition,there is another branch,which is speech separation technology.Speech separation technology can be divided into many categories according to different sound input and application scenarios.In this paper,we study the multi-speaker speech separation technology based on deep learning.There are many ways to implement speech separation technology.In this paper,speech separation is considered as a binary classification problem.The article considers the target speaker's voice as a positive example and other uninteresting voices as negative examples.By extracting the positive speech of interest from the mixed speech,we can achieve the separation effect,and then change different target speakers,we can achieve the speech separation of multiple speakers.It can be seen from the method that the premise of separating voice from multi speaker voice is to have the speech information of separated target as a positive example reference,and then to separate the corresponding speaker voice from multi speaker through the model.This paper is based on the above thinking to design a multi speaker speech separation method.The technical points of this paper are divided into two parts,which are single speaker speech segment acquisition and multi speaker speech separation model training.Firstly,BIC,hierarchical clustering,K-means and Gap-Statistic technology were used to segment and cluster the input mixed speech to obtain the voiceprint information of each speaker.Then,using deep learning methods,LSTM and CNNs networks were used to design and train a multi-speaker speech separation model.Finally,in an end-to-end manner,we input the mixed voice into the model.With the help of the acquired voiceprint information,the model outputs the voice information of each speaker separated from the mixed voice.
Keywords/Search Tags:speech segmentation and clustering, multi-speaker, speech separation, deep learning, end-to-end
PDF Full Text Request
Related items