Font Size: a A A

Research On Multi-task Learning Based Far-field Speaker Verification

Posted on:2022-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:W XuFull Text:PDF
GTID:2518306569466034Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
As a contactless biometric technology,speaker verification has gained widespread use in areas such as smart homes,social security verification and financial security.The development of deep learning has further contributed to the improvement of speaker verification technology.However,far-field verification is still a challenging research topic.The difference in signal-tonoise ratio between registered and verified voice samples due to different recording distances severely degrades speaker verification performance.To improve the performance of speaker verification techniques for long-distance interaction,the usual approach is to use signal enhancement techniques to improve the quality of the far-field sound signal,but due to the introduction of non-linear processing functions,it is easy to cause the loss of some speaker feature information and degrade the verification performance.In addition,discrepancy compensation algorithms based on feature space dimensionality reduction and decomposition have also been applied to far-field speaker verification.These algorithms eliminate the effect of distance discrepancies on verification results by first dimensionally reducing speaker embedding at different distances and then mapping them into the same embedding space.However,it requires a specially trained model,and is not an end-to-end solution.In addition,the verification performance relies on the selection of the embedding space and the probability distribution of the speaker embedding.To address the shortcomings of existing far-field speaker verification algorithms,the following works are done in this paper.First,distance labels are introduced as supervised information.Using speaker classification as the primary task and distance discrimination as an auxiliary task to determine whether two input voice samples have the same distance label.We use auxiliary task to motivate the model to learn the distance discrepancy information,and then eliminate this discrepancy information from the speaker embedding through the gradient reversal layer,making the speaker embedding insensitive to distance.Second,combining multi-task learning and gradient reversal layer.Multi-task learning is used to stimulate the representation of distance discrepancy information,and a gradient reversal layer is used to suppress the effect of distance discrepancy information on speaker embedding.The distance discrimination task is added directly before the speaker embedding layer in the model,and the distance discrimination branch is accessed again after this layer through the gradient reversal layer.Stimulating the representation of distance discrepancy information first and then suppressing it can further improve far-field speaker verification performance.Third,a dynamic loss weight updating strategy is used to adjust the weight of the loss of each task in the total loss value during training.The loss weights are dynamically adjusted according to the convergence rate of each task,so that each task can be optimized simultaneously to fully exploit the distance discrepancy information between samples.Experiments are conducted on the Hi-mia dataset.Far-field speaker verification is divided into two categories,near-field registered far-field verification and far-field registered far-field verification.Deep residual network(Res Net)and Time Delay Neural Networks(TDNN)are used as the base network structure respectively,combined with multi-task learning and gradient reversal layer,and distance discrimination is used as an auxiliary task.The experimental results show that the structure achieves Equal Error Rate(EER)of 6.68% and 7.07% for the two types of tasks respectively,which is a relative improvement of 6.69% and 10.3% compared to the single-task benchmark model.The proposed algorithm can improve the performance of farfield speaker verification without signal enhancement and without training the back-end scoring model separately.
Keywords/Search Tags:far-filed speaker verification, multi-task learning, gradient reversal layer, dynamic loss weighting strategy
PDF Full Text Request
Related items