Font Size: a A A

The Research Of Target-dependent Speech Separation Based On Deep Learning

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2518306104988509Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since artificial intelligence has become the hottest research field,our lifestyle has gradually moved towards intelligence.Smart homes are becoming more and more popular in every aspect of people's daily lives,and the smart voice device that is the entrance to smart homes has attracted a lot of researchers' enthusiasm for research.How to obtain the purest voice and improve voice recognition from the real environment is closely related.This article is based on the multi-speaker separation task with specific speakers,and proposes a new concept of single label training.This method has several characteristics that are different from traditional methods.First of all,the traditional multi-speaker separation method is mostly based on target-independent tasks,that is,a model with strong generalization ability is trained through a large number of different training data to adapt to more speakers.This model has an obvious disadvantage.It depends too much on the signal-to-noise ratio during mixing,that is,speech energy.When the speech energy of different speakers is equal or similar,the results of this method will be greatly reduced.The single label training focuses on a specific speaker,keeping a speaker unchanged during the training process,while interference selected randomly from other speakers.During the training and test processes,the target speech is mixed with non-target in a different signal-to-noise ratio.Secondly,single label training innovatively applies target-independent methods to target-dependent tasks,using minimum cross loss to achieve the best match between target and non-target speech,which can avoid the label permutation problem and separate both target and non-targets from mixture.We collected a total of about35 hours of Mandarin corpus.The results show that after using single label training,the separation results have been greatly improved compared with the control group.In the case of 6d B signal-to-noise ratio mixing,for the target speaker and the interfering speaker,the SDR results of single label training are increased by 17.4% and 7.6% respectively compared with the control group.When the mixed SNR is reduced to 0d B,the SDR results of single label training are more comprehensively leading than the control group.The target speaker's SDR increased by 123.8%,and the non-target speaker's SDR increased by130.8%.On another evaluation standard,PESQ,the experimental results using single label training are still ahead of the traditional target-independent models.
Keywords/Search Tags:Single Label Training, Speech Separation, Target-dependent, Mask Estimation, Deep Learning
PDF Full Text Request
Related items