Research On Synthetic Speech Detection And Application In Automatic Speaker Verification

Posted on:2024-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:L Wu

Full Text:PDF

GTID:2568307061485844

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As a type of biometric technology,Automatic Speaker Verification has many applications,such as smart-home devices,bank ID identification,and identifying criminal suspects.Safety of ASV is an important research content.In recent years,speech synthesis has become more and more mature.This technology can be used to generate audio files which is difficult to distinguish it real or fake.It is a great threat to ASV.Someone can illegally obtain the user’s access rights with the help of speech synthesis,which poses great threat to voice-based access control.To ensure the security of ASV,it is necessary to develop Synthetic Speech Detection technology,which is the key to ensuring the security of ASV systems.The research content of this paper is mainly on Synthetic Speech Detection.The main work and innovations completed in this paper are as follows.1.Different acoustic features was used for Synthetic Speech Detection.And the models trained by different feature were fused.Firstly,LFCC features was chosen as input feature.A ResNet-based structure was designed for this task,and different loss functions were tried.Then,this paper analyzes the FBANK performance under different window lengths and window functions.A data augmentation scheme is designed to improve the detection performance as well.Finally,pre-trained model was adopted.Different ways of fine-tuning were tried for this task.The final fused model can get EER of 0.12%,min-t DCF of 0.0063.2.Synthetic Speech Detection needs to serve ASV systems.Telephone is a common scenario for ASV.The codecs transmission occurs when across the telephone network such as Vo IP and PSTN.So Synthetic speech Detection needs to ensure robustness under different codecs.In order to solve this problem,this paper proposes a codecs data augmentation way and MKTDNN structure for it.In every convolution block of MKTDNN,feature maps achieved by convolution computations with different kernel sizes can be fused efficiently by feature fusion module.This system achieves EER of 3.85% and min-t DCF of 0.2753 in ASVspoof2021,which is a relatively 17.9% improvement in EER and 4.5% in min-t DCF over the best baseline.3.This paper explores how to apply the Synthesized Speech Detection systems to the ASV systems.Respective method was designed for improving performance of ASV and Synthesized Speech Detection.PLDA was chosen for ASV backend.As mismatch between test data and training data,domain adaptation technology was taken for improving speaker verification performance.Synthesized Speech Detection keep going approach in previous research.A score-based fusion method is designed for fused two different system.The final SASV-EER is 1.04% and SV-EER is 1.46%in SASV challenge.

Keywords/Search Tags:

Automatic Speaker Verification, Synthetic Speech Detection, Feature Fusion

PDF Full Text Request

Related items

1	A Study On The Countmeasures Of The Automatic Speaker Verification System Against Synthetic Speech
2	Study On The Deception Detection Method Identified By The Automatic Speaker Verification System
3	Research On Speech Spoofing Detection Based On Feature Fusion And Residual Attention
4	Automatic speechreading for improved speech recognition and speaker verification
5	Application Of Playback Speech Detection Method Based On AdaBoost Algorithm In Automatic Speaker Verification
6	Research On Synthetic And Converted Speech Detection Based On Multi-branch Convolutional Neural Network
7	Research On Speaker Verification System Based On Perceptual Log Area Ratio
8	Research And Implementation Of Speaker Recognition Method For Anti-playback Fake Speech
9	Research On Feature Of Speaker Vevification And Playback Attacks Detection
10	Synthetic Speech Detection Using Multi-Domain Features