Font Size: a A A

Research On Speaker Verification Technology Based On Federated Learning

Posted on:2022-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2568306839988429Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic speaker verification(ASV)aims to verify whether a speech belongs to a specific speaker based on the speakers’ s known utterances.The speaker verification methods based on probability statistics and deep learning are the current mainstream research directions.In the training stage,both types of methods require a large amount of speech data from different speakers to train the model.However,the speech data often contains sensitive features that can reflect the identity of the speaker,such as accent,pitch,and so on.Therefore,centralized model training methods will become infeasible under the implementation of the Personal Information Protection Law,which is a privacy preservation regulation.Current speaker verification research faces the challenges of data scarcity and privacy protection.And federated learning can alleviate the data scarcity problem in constructing a machine learning model while protecting data privacy.This dissertation designs two new types of federated learning speaker verification methods based on probability statistics and deep learning.In the study of building federated speaker verification technology based on probability statistics method,we propose a novel framework named Federated Speaker Verification based on GMM-UBM,or Fed GU.Fed GU can alleviate the problem of privacy leakage caused by model parameters transferred between clients and server while ensuring the accuracy of speaker verification.In Fed GU,we first designed a speech data desensitization solution SHS(Selecting and Hiding Sensitive Information)for speaker verification,which can select and hide sensitive features from speech data.Secondly,after desensitizing the speech data of the clients,the central server cooperates with the clients to construct the background model UBM(Universal Background Model).Finally,each client receives the model parameters of the central server and performs adaptive learning of the local model parameters to obtain the speaker verification model GMM(Gaussian Mixture Model)of the client.In the study of building federated speaker verification technology based on the deep learning method,we first verified through experiments that data heterogeneity will hurt the verification accuracy of the federated speaker verification model.Secondly,we propose a novel framework named Federated Speaker Verification based on End-to-End Architecture,or Fed ETE,that is based on neural networks.Fed ETE can alleviate the problem of low speaker verification accuracy caused by data heterogeneity.To deal with the problems caused by data heterogeneity,we designed a new type of module named Local Data Rebalance,or LDR,in Fed ETE.LDR can re-arrange the client training sequence to construct a virtual dataset that has low data bias,so the degree of data heterogeneity can be alleviated and the speaker verification accuracy will be improved.
Keywords/Search Tags:speaker verification, probability statistics, deep learning, privacy preservation, federated learning
PDF Full Text Request
Related items