Robust Speaker Recognition Based On Multi-layer Feature Aggregation And Adversarial Learning

Posted on:2024-08-06

Degree:Master

Type:Thesis

Country:China

Candidate:H Chang

Full Text:PDF

GTID:2568307100480634

Subject:Master of Electronic Information (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Speaker recognition is a technology to identify the speaker’s identity with the features of speaker’s speech,and has been widely used in the financial security,crime forensics,venue security,etc.In practical applications,the performance of speaker recognition system decreases obviously due to the influence of noise in the environment,which hinders the practical application and development of speaker recognition technology.In this thesis,the robust speaker recognition algorithms based on multi-layer feature aggregation and adversarial learning are studied.The main research work of this thesis is as follows:1.A multi-layer feature aggregation based adversarial network(MFA-AN)is proposed to extract noise-robust speaker embedding vectors through the multi-layer feature aggregation mechanism and adversarial learning mechanism.The MFA-AN consists of four parts: feature enhancement network,embedding extraction network based on multi-layer feature aggregation,classifier and discriminator.The feature enhancement network extracts the enhanced features close to clean speech features from the noisy speech features by learning the mapping relationship between noisy speech features and clean speech features;the embedding extraction network is jointly trained with classifier to extract speaker embedding vectors from the enhanced features that can characterize the speaker’s identity;the embedding extraction network and the discriminator are trained to avoid encoding the residual noise information in the enhancement features to the speaker embedding vectors.Experimental results show that compared with some baseline systems,the speaker recognition algorithm based on MFA-AN has better recognition performance under noise conditions.2.Based on the embedding extraction network based on multi-layer feature aggregation,a multi-layer feature aggregation residual block is designed through a channel weighted attention mechanism and residual connection mechanism.A multilayer feature weighted aggregation and residual connection based adversarial network(MFA-CA-Res-AN)is proposed by using multi-layer feature aggregation residual blocks to form a new embedding extraction network.In the embedding extraction network of MFA-CA-Res-AN,the weight coefficients of the output features of different network layers are first learned by the weight learning network,and then the weight coefficients are weighted with the output features of the corresponding network layer to highlight the output features containing more speaker information and weaken the output features containing more speaker-independent information.Experimental results show that the speaker recognition algorithm based on MFA-CA-Res-AN is better than that of some baseline systems and MFA-AN-based speaker recognition algorithms in noisy environment.

Keywords/Search Tags:

Speaker recognition, Multi-layer feature aggregation, Adversarial network, Speaker embedding, Residual connection

PDF Full Text Request

Related items

1	Research On Speaker Recognition Based On Deep Learning
2	End-to-End Speaker Embedding For Speaker Recognition In The Wild
3	Research On Speaker Verification In Complex Scenarios
4	Speaker Recognition Research Based On GMM Speaker Clustering Technology
5	Research On Speaker Representation Based On MG Training Criteria
6	Research On Improvement Of Speaker Recognition Algorithms Based On Hand-held Device
7	Any Text Speaker Recognition System
8	Research On Methods Of Improving The Representation Ability Of Speaker Recognition Models
9	Studies On Speaker Recognition Based On SVM And GMM
10	Speaker Recognition Algorithm Based On Residual Network