Font Size: a A A

Research On IVA Based Speech Separation Algorithm

Posted on:2022-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y GuFull Text:PDF
GTID:2518306725490404Subject:Acoustics
Abstract/Summary:PDF Full Text Request
With the development of audio and visual conference systems and intelligent speech interaction devices,speech enhancement has received extensive attention in the field of communication and artificial intelligence.Blind source separation(BSS)aims at retrieving individual source signals from their mixtures,which plays an important role in speech enhancement.In particular,speech separation algorithms focus on the separation of speech signals,and have been widely applied in multi-speaker scenarios.Independent vector analysis(IVA)is a widely acknowledged state-of-the-art(SOTA)algorithm for multichannel frequency-domain blind source separation,and is a highly competitive algorithm in practical applications due to its low computational complexity and stable separation performance.This thesis focuses on three aspects of the IVA based speech separation algorithm,namely,the selection of speech models,the extraction of the target speech signal,and the solution to the global permutation problem.Independent vector analysis utilizing Gaussian mixture model(GMM)as source priors has been demonstrated as an effective model for joint blind source separation(JBSS).However,an extra pre-training process is usually required to provide initial parameter values for successful speech separation.In order to remove the pre-training,an amplitude variable Gaussian mixture model is proposed as the speech model by introducing a time-varying parameter.Experiments are conducted to confirm the efficacy of the proposed method under random initialization.Compared to BSS,target speech extraction is a more appropriate choice as an automatic speech recognition(ASR)front-end because it directly outputs the enhanced target signal.A sequential approach for target speech extraction by combining BSS with the x-vector based speaker recognition(SR)module is investigated.Two variations of IVA algorithms,i.e.independent low-rank matrix analysis(ILRMA)and multichannel variational autoencoder(MVAE),together with two data augmentation strategies to train the SR module are compared and investigated through numerous simulations in terms of separation performance and extraction accuracy.In order to solve the global permutation problem of many frequency domain BSS algorithms,a semi-supervised blind speech separation algorithm with designated channel order is proposed by introducing a variational autoencoder(VAE)as the source model.Disentangled speaker information and content information of speech signals can be obtained owing to the instance normalization(IN)and adaptive instance normalization(Ada IN)strategies applied in the VAE network.A further denoising training stage for the decoder network is proposed to mitigate the possible block permutation problem.Simulations are implemented to analyze the separation performance and accuracy of output channel arrangement using both seen and unseen speakers.
Keywords/Search Tags:frequency domain joint blind source separation, independent vector analysis, target speech extraction, global permutation problem, Gaussian mixture model, variational autoencoder
PDF Full Text Request
Related items