Font Size: a A A

Robust Speaker Recognition Based On Multi-layer Feature Aggregation And Adversarial Learning

Posted on:2024-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:H ChangFull Text:PDF
GTID:2568307100480634Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Speaker recognition is a technology to identify the speaker’s identity with the features of speaker’s speech,and has been widely used in the financial security,crime forensics,venue security,etc.In practical applications,the performance of speaker recognition system decreases obviously due to the influence of noise in the environment,which hinders the practical application and development of speaker recognition technology.In this thesis,the robust speaker recognition algorithms based on multi-layer feature aggregation and adversarial learning are studied.The main research work of this thesis is as follows:1.A multi-layer feature aggregation based adversarial network(MFA-AN)is proposed to extract noise-robust speaker embedding vectors through the multi-layer feature aggregation mechanism and adversarial learning mechanism.The MFA-AN consists of four parts: feature enhancement network,embedding extraction network based on multi-layer feature aggregation,classifier and discriminator.The feature enhancement network extracts the enhanced features close to clean speech features from the noisy speech features by learning the mapping relationship between noisy speech features and clean speech features;the embedding extraction network is jointly trained with classifier to extract speaker embedding vectors from the enhanced features that can characterize the speaker’s identity;the embedding extraction network and the discriminator are trained to avoid encoding the residual noise information in the enhancement features to the speaker embedding vectors.Experimental results show that compared with some baseline systems,the speaker recognition algorithm based on MFA-AN has better recognition performance under noise conditions.2.Based on the embedding extraction network based on multi-layer feature aggregation,a multi-layer feature aggregation residual block is designed through a channel weighted attention mechanism and residual connection mechanism.A multilayer feature weighted aggregation and residual connection based adversarial network(MFA-CA-Res-AN)is proposed by using multi-layer feature aggregation residual blocks to form a new embedding extraction network.In the embedding extraction network of MFA-CA-Res-AN,the weight coefficients of the output features of different network layers are first learned by the weight learning network,and then the weight coefficients are weighted with the output features of the corresponding network layer to highlight the output features containing more speaker information and weaken the output features containing more speaker-independent information.Experimental results show that the speaker recognition algorithm based on MFA-CA-Res-AN is better than that of some baseline systems and MFA-AN-based speaker recognition algorithms in noisy environment.
Keywords/Search Tags:Speaker recognition, Multi-layer feature aggregation, Adversarial network, Speaker embedding, Residual connection
PDF Full Text Request
Related items